├── requirements.txt ├── .gitignore ├── README.md └── pumpfun_research.py /requirements.txt: -------------------------------------------------------------------------------- 1 | requests 2 | pandas 3 | tqdm 4 | tenacity 5 | python-dateutil 6 | scipy 7 | python-dotenv 8 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Python cache and build artifacts 2 | __pycache__/ 3 | *.py[cod] 4 | *.so 5 | *.egg 6 | *.egg-info/ 7 | dist/ 8 | build/ 9 | 10 | # Virtual environments 11 | .venv/ 12 | venv/ 13 | ENV/ 14 | 15 | # IDE/editor directories 16 | .vscode/ 17 | .idea/ 18 | *.swp 19 | 20 | # Environment and local config 21 | .env 22 | .env.* 23 | *.local 24 | .DS_Store 25 | 26 | # Logs and data dumps 27 | *.log 28 | *.csv 29 | *.tsv 30 | *.sqlite3 31 | 32 | # Tooling caches 33 | .cache/ 34 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pumpfun-research 2 | 3 | Python utilities for analysing a Solana wallet's pump.fun trades and Raydium Launchpad launches. The tool batches Helius-enhanced transaction data, classifies BUY/SELL flows, and aggregates per-mint performance metrics with optional metadata enrichment. 4 | 5 | ## Features 6 | - Batch pulls pump.fun BUY/SELL activity for a wallet and computes PnL, delay-to-first-sell, and summary stats. 7 | - Raydium Launchpad detection that combines instruction-program fingerprints with log keywords to flag launches while minimising RPC calls. 8 | - Optional "pump.fun + Raydium" mode that unions both sets of mints in a single run. 9 | - CSV export with human-readable timestamps plus optional full mint transaction logs. 10 | - Mode-aware caching (`.cache/_{pumpfun|pumpfun_plus_raydium|raydium_launchpad}`) with automatic resume, program-ID persistence, and parallelised log scans for fast re-runs. 11 | 12 | ## Requirements 13 | - Python 3.10+ 14 | - A Solana RPC endpoint (`RPC_URL`) 15 | - Strongly recommended: a Helius API key (`HELIUS_API_KEY`) for enhanced decoding and DAS metadata 16 | 17 | Install dependencies with: 18 | 19 | ```bash 20 | python -m venv .venv 21 | source .venv/bin/activate 22 | pip install -r requirements.txt 23 | ``` 24 | 25 | ## Configuration 26 | Populate the supplied `.env` (or environment variables) with your connection details: 27 | 28 | ```bash 29 | HELIUS_API_KEY=your_helius_key # optional but speeds up decoding/metadata 30 | RPC_URL=https://your.solana.rpc # required fallback RPC 31 | ``` 32 | 33 | The script loads `.env` automatically on startup. 34 | 35 | ## Usage 36 | ### Basic pump.fun scan 37 | ```bash 38 | python pumpfun_research.py --wallet [--limit 25000] [--days 7] [--max-mints 50] [--fetch-metadata] 39 | ``` 40 | Key flags: 41 | - `--wallet` *(required)*: Base58 wallet to analyse. 42 | - `--limit`: Maximum signatures to fetch (default 25,000). 43 | - `--days`: Restrict history to the last *N* days. 44 | - `--max-mints`: Stop after analysing *N* token mints (fast tests). 45 | - `--fetch-metadata`: Pull token `name` / `symbol` from Helius DAS (slower). 46 | - `--helius-workers`: Concurrent Helius batch requests (defaults to up to 4 based on CPU count). Increase to speed up large runs; set to `1` to disable parallel decoding if you hit rate limits. 47 | 48 | ### Raydium Launchpad mode 49 | ```bash 50 | python pumpfun_research.py --wallet --raydium-launchpad 51 | ``` 52 | The Raydium filter looks for launch-related log keywords (`raydium launchpad`, `acceleraytor`, etc.) plus known launch/trigger program IDs. Instruction fingerprints are cached per signature, so subsequent runs can usually skip extra RPC calls. Default heuristics are defined inside `pumpfun_research.py` (`RAYDIUM_LAUNCHPAD_*` constants) and can be edited directly if you have better identifiers. 53 | 54 | ### Combined pump.fun + Raydium run 55 | ```bash 56 | python pumpfun_research.py --wallet --include-raydium-launchpad 57 | ``` 58 | This mode keeps pump.fun mints and any Raydium Launchpad coins detected for the wallet in one CSV. The summary footer reports how many of each were found. 59 | 60 | ### Export full mint transaction logs 61 | Add `--export-mint-txs` to persist every decoded token transfer touching each mint. By default the exporter scans all signatures; constrain it with `--mint-tx-limit ` or redirect output with `--mint-tx-output-dir`. 62 | 63 | ```bash 64 | python pumpfun_research.py \ 65 | --wallet \ 66 | --include-raydium-launchpad \ 67 | --export-mint-txs \ 68 | --mint-tx-limit 0 \ 69 | --helius-workers 4 \ 70 | --days 420 71 | ``` 72 | The command above scans roughly 420 days of history, unions pump.fun and Raydium mints, and saves per-mint CSVs under `mint_txs//`. Use `--mint-tx-limit 0` for “no limit” (scan all signatures); omit the flag to export without limiting, or set a positive number to cap per-mint history. Passing a negative limit disables export even if `--export-mint-txs` is present. 73 | 74 | ### Caching behaviour 75 | - Cache folders are mode-specific: `.cache/_pumpfun`, `.cache/_pumpfun_plus_raydium`, or `.cache/_raydium_launchpad`. 76 | - Each cache contains `signatures.json`, decoded Helius batches, the aggregated token events, plus `signature_programs.json` (instruction IDs) for fast launchpad filtering. 77 | - If a matching cache already exists the script auto-resumes; you can point at a custom location via `--cache-dir` or `--resume-cache`. 78 | - Legacy caches without the suffix are detected automatically—you can rename them yourself or let the tool reuse them on the next run. 79 | 80 | ## Output 81 | The script prints summary statistics to stdout and writes a CSV per run: 82 | - `wallet_pumpfun.csv` for pump.fun-only scans 83 | - `wallet_pumpfun_plus_raydium.csv` when `--include-raydium-launchpad` is supplied 84 | - `wallet_raydium_launchpad.csv` when `--raydium-launchpad` is supplied 85 | 86 | CSV columns include mint address, optional name/symbol, first buy/sell timestamps, buy amounts, net profit in SOL, and more. 87 | 88 | When `--export-mint-txs` is enabled, an additional CSV is produced per mint under `mint_txs//`, capturing from/to users, token amounts, and SOL movements for every transfer involving that mint. 89 | 90 | ## Customising Raydium detection 91 | Edit the following constants in `pumpfun_research.py` to suit your heuristics: 92 | - `RAYDIUM_LAUNCHPAD_KEYWORDS` 93 | - `RAYDIUM_LAUNCHPAD_PROGRAM_IDS` 94 | - `RAYDIUM_LAUNCHPAD_TRIGGER_PROGRAM_IDS` 95 | 96 | Keywords are lowercase substrings matched against transaction logs. Program ID sets can be extended with authoritative IDs when available. 97 | 98 | ## Development notes 99 | - Keep the virtual environment and `.env` out of version control (already listed in `.gitignore`). 100 | - Preferred formatting is standard `black`/`ruff` tooling, though no formatter is enforced. 101 | - Use `PYTHONPYCACHEPREFIX=.pycache python3 -m compileall pumpfun_research.py` to do a quick syntax check. 102 | 103 | ## License 104 | MIT (unless stated otherwise in individual files). 105 | -------------------------------------------------------------------------------- /pumpfun_research.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Analyze a Solana wallet's pump.fun + Raydium Launchpad activity: 4 | - Find pump.fun BUY/SELL events by the wallet, grouped per token mint 5 | - Compute time from first BUY -> first SELL (seconds) 6 | - Compute cost, proceeds, net profit (incl. fees) per token 7 | - Optional Raydium Launchpad filter (best-effort log detection) 8 | - Summaries + correlations, CSV export 9 | 10 | Inputs: 11 | --wallet Base58 pubkey of the dev wallet 12 | --limit Max signatures to scan (default 25000) 13 | --days Restrict history to last N days 14 | --fetch-metadata Also fetch token name/symbol via Helius DAS (slower) 15 | --max-mints Stop after analyzing this many mints (for quicker tests) 16 | --raydium-launchpad Restrict output to coins this wallet launched on Raydium Launchpad (best-effort log detection) 17 | Env: 18 | HELIUS_API_KEY If set, used for Enhanced TX + DAS (recommended) 19 | RPC_URL JSON-RPC endpoint (fallback + log fetch) 20 | """ 21 | 22 | import os 23 | import sys 24 | import time 25 | import math 26 | import json 27 | import argparse 28 | import traceback 29 | import threading 30 | from concurrent.futures import ThreadPoolExecutor, as_completed 31 | from datetime import datetime, timedelta, timezone 32 | from pathlib import Path 33 | from typing import Dict, List, Any, Tuple, Optional, Set 34 | import requests 35 | from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type 36 | import pandas as pd 37 | from tqdm import tqdm 38 | from dateutil import tz 39 | from scipy.stats import pearsonr, spearmanr 40 | from dotenv import load_dotenv 41 | 42 | PUMP_FUN_PROGRAM_ID = "6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P" # pump.fun program (source-cited) 43 | 44 | WSOL_MINT = "So11111111111111111111111111111111111111112" 45 | 46 | # ---------- Config helpers ---------- 47 | 48 | load_dotenv() # load settings from .env if present 49 | 50 | def now_utc() -> int: 51 | return int(time.time()) 52 | 53 | def days_ago_epoch(days: int) -> int: 54 | return now_utc() - days * 86400 55 | 56 | def get_env(name: str, default: Optional[str] = None) -> Optional[str]: 57 | val = os.getenv(name) 58 | return val if val else default 59 | 60 | HELIUS_API_KEY = get_env("HELIUS_API_KEY") 61 | RPC_URL = get_env("RPC_URL", f"https://api.mainnet-beta.solana.com") # any mainnet RPC ok 62 | 63 | if not RPC_URL: 64 | print("ERROR: Please set RPC_URL (or provide HELIUS_API_KEY to derive one).", file=sys.stderr) 65 | sys.exit(1) 66 | 67 | if HELIUS_API_KEY: 68 | HELIUS_ENHANCED_BY_ADDR = f"https://api.helius.xyz/v0/addresses/{{address}}/transactions?api-key={HELIUS_API_KEY}" 69 | HELIUS_TRANSACTIONS = f"https://api.helius.xyz/v0/transactions?api-key={HELIUS_API_KEY}" 70 | HELIUS_DAS = f"https://mainnet.helius-rpc.com/?api-key={HELIUS_API_KEY}" 71 | else: 72 | HELIUS_ENHANCED_BY_ADDR = None 73 | HELIUS_TRANSACTIONS = None 74 | HELIUS_DAS = None 75 | 76 | _THREAD_LOCAL = threading.local() 77 | 78 | 79 | def _get_session() -> requests.Session: 80 | sess = getattr(_THREAD_LOCAL, "session", None) 81 | if sess is None: 82 | sess = requests.Session() 83 | sess.headers.update({"Content-Type": "application/json"}) 84 | _THREAD_LOCAL.session = sess 85 | return sess 86 | 87 | 88 | class CacheManager: 89 | def __init__(self, root: Optional[str], read_only: bool = False): 90 | self.root = Path(root).expanduser() if root else None 91 | self.read_only = read_only 92 | if self.root and not self.read_only: 93 | self.root.mkdir(parents=True, exist_ok=True) 94 | 95 | def _path(self, name: str) -> Optional[Path]: 96 | if not self.root: 97 | return None 98 | return self.root / name 99 | 100 | def load_json(self, name: str) -> Optional[Any]: 101 | path = self._path(name) 102 | if not path or not path.exists(): 103 | return None 104 | with path.open("r", encoding="utf-8") as fh: 105 | return json.load(fh) 106 | 107 | def save_json(self, name: str, data: Any) -> None: 108 | if self.read_only: 109 | return 110 | path = self._path(name) 111 | if not path: 112 | return 113 | with path.open("w", encoding="utf-8") as fh: 114 | json.dump(data, fh) 115 | 116 | class RpcError(Exception): 117 | pass 118 | 119 | 120 | def sanitize_for_fs(value: str) -> str: 121 | """Return a filesystem-safe name derived from the supplied value.""" 122 | allowed = [] 123 | for ch in value.strip(): 124 | if ch.isalnum() or ch in ("-", "_", "."): 125 | allowed.append(ch) 126 | else: 127 | allowed.append("_") 128 | name = "".join(allowed) 129 | return name or "wallet" 130 | 131 | 132 | def _serialize_program_map(prog_map: Dict[str, Set[str]]) -> Dict[str, List[str]]: 133 | return {sig: sorted(list(progs)) for sig, progs in prog_map.items()} 134 | 135 | # ---------- RPC helpers ---------- 136 | 137 | @retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=0.5, min=0.5, max=6), reraise=True) 138 | def rpc(method: str, params: Any) -> Any: 139 | """Generic JSON-RPC""" 140 | payload = {"jsonrpc": "2.0", "id": "x", "method": method, "params": params} 141 | r = _get_session().post(RPC_URL, data=json.dumps(payload), timeout=30) 142 | if r.status_code != 200: 143 | raise RpcError(f"RPC HTTP {r.status_code}: {r.text}") 144 | out = r.json() 145 | if "error" in out: 146 | raise RpcError(out["error"]) 147 | return out["result"] 148 | 149 | def get_signatures_for_address(address: str, before: Optional[str], limit: int) -> List[Dict[str, Any]]: 150 | params = [address, {"limit": limit}] 151 | if before: 152 | params[1]["before"] = before 153 | return rpc("getSignaturesForAddress", params) 154 | 155 | def get_transaction(signature: str, max_version: int = 0) -> Dict[str, Any]: 156 | return rpc("getTransaction", [signature, {"maxSupportedTransactionVersion": max_version}]) 157 | 158 | # ---------- Helius Enhanced helpers ---------- 159 | 160 | @retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=0.5, min=0.5, max=6), reraise=True) 161 | def helius_get_transactions_by_address(address: str, before: Optional[str], limit: int = 1000) -> List[Dict[str, Any]]: 162 | assert HELIUS_ENHANCED_BY_ADDR 163 | url = HELIUS_ENHANCED_BY_ADDR.format(address=address) 164 | q = {"limit": limit} 165 | if before: 166 | q["before"] = before 167 | r = _get_session().get(url, params=q, timeout=30) 168 | r.raise_for_status() 169 | return r.json() 170 | 171 | @retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=0.5, min=0.5, max=6), reraise=True) 172 | def helius_get_transactions_batch(signatures: List[str]) -> List[Dict[str, Any]]: 173 | assert HELIUS_TRANSACTIONS 174 | r = _get_session().post(HELIUS_TRANSACTIONS, data=json.dumps({"transactions": signatures}), timeout=60) 175 | r.raise_for_status() 176 | return r.json() 177 | 178 | def helius_get_asset(mint: str) -> Optional[Dict[str, Any]]: 179 | if not HELIUS_DAS: 180 | return None 181 | body = {"jsonrpc": "2.0", "id": "getAsset", "method": "getAsset", "params": {"id": mint}} 182 | r = _get_session().post(HELIUS_DAS, data=json.dumps(body), timeout=20) 183 | if r.status_code != 200: 184 | return None 185 | j = r.json() 186 | return j.get("result") 187 | 188 | # ---------- Launchpad detection helpers ---------- 189 | # Goal: best-effort inference of coins created via Raydium Launchpad by this wallet. 190 | # Strategy: 191 | # 1) Scan the wallet's transactions (signatures we already fetched) with getTransaction. 192 | # 2) Require Raydium launch-specific program IDs and/or log keywords. 193 | # 3) Extract candidate mints from meta.postTokenBalances; intersect with the per-mint analysis set. 194 | # Tuning via env vars: 195 | # RAYDIUM_LAUNCHPAD_KEYWORDS keywords searched for in log messages (lowercase) 196 | # RAYDIUM_LAUNCHPAD_PROGRAM_IDS Raydium launchpad program IDs (optional) 197 | # RAYDIUM_LAUNCHPAD_TRIGGER_PROGRAM_IDS additional programs whose presence hints at launchpad flows 198 | 199 | RAYDIUM_LAUNCHPAD_KEYWORDS: List[str] = [ 200 | "raydium launchpad", 201 | "acceleraytor", 202 | "raydium accelerator", 203 | "raydium ido", 204 | "instruction: launch", 205 | ] 206 | 207 | # Program IDs observed in Raydium launch flows. Extend/tune as better references surface. 208 | RAYDIUM_LAUNCHPAD_PROGRAM_IDS: Set[str] = { 209 | "LanMV9sAd7wArD4vJFi2qDdfnVhFxYSUg6eADduJ3uj", # launchpad orchestrator 210 | "pfeeUxB6jkeY1Hxd7CsFCAjcbHA9rWtchMGdZ6VojVZ", # Raydium launch instruction router 211 | } 212 | 213 | # Trigger programs that frequently appear alongside launchpad activity. 214 | RAYDIUM_LAUNCHPAD_TRIGGER_PROGRAM_IDS: Set[str] = { 215 | "RVKd61ztZW9uA1s4x1tChX7Pyip8nHqUrw1uuJAvV5F", # Raydium AMM 216 | "dbcij3LWUppWqq96dh6gJWwBifmcGfLSB5D4DuSMaqN", # Raydium authority utility program 217 | } 218 | 219 | def _account_keys_as_strs(tx: Dict[str, Any]) -> List[str]: 220 | """Return all account key base58 strings from a getTransaction result.""" 221 | msg = (tx.get("transaction", {}) or {}).get("message", {}) 222 | keys = msg.get("accountKeys", []) or [] 223 | out: List[str] = [] 224 | for k in keys: 225 | if isinstance(k, str): 226 | out.append(k) 227 | elif isinstance(k, dict) and k.get("pubkey"): 228 | out.append(k["pubkey"]) 229 | return out 230 | 231 | def _logs_concat_lower(tx: Dict[str, Any]) -> str: 232 | logs = ((tx.get("meta", {}) or {}).get("logMessages", []) or []) 233 | return "\n".join([str(x) for x in logs]).lower() 234 | 235 | def _extract_mints_from_tx(tx: Dict[str, Any]) -> Set[str]: 236 | mints: Set[str] = set() 237 | meta = (tx.get("meta", {}) or {}) 238 | for b in (meta.get("postTokenBalances", []) or []): 239 | mint = b.get("mint") 240 | if mint: 241 | mints.add(mint) 242 | return mints 243 | 244 | 245 | def _collect_program_ids(tx: Dict[str, Any]) -> Set[str]: 246 | programs: Set[str] = set() 247 | for ins in (tx.get("instructions") or []): 248 | pid = ins.get("programId") 249 | if pid: 250 | programs.add(pid) 251 | for inner in ins.get("innerInstructions") or []: 252 | pid_inner = inner.get("programId") 253 | if pid_inner: 254 | programs.add(pid_inner) 255 | return programs 256 | 257 | def detect_launchpad_mints(wallet: str, 258 | signatures: List[str], 259 | candidate_mints: Optional[Set[str]] = None, 260 | signature_mints: Optional[Dict[str, Set[str]]] = None, 261 | signature_programs: Optional[Dict[str, Set[str]]] = None, 262 | required_program_ids: Optional[Set[str]] = None, 263 | trigger_program_ids: Optional[Set[str]] = None, 264 | keywords: Optional[List[str]] = None, 265 | tqdm_label: str = "launchpad", 266 | max_rpc_workers: int = 8) -> Set[str]: 267 | """Generic launchpad detector driven by program IDs / log keywords. 268 | 269 | Fast path: use cached program IDs to accept/reject signatures without extra RPC. 270 | Slow path: fetch remaining transactions (optionally in parallel) to inspect logs. 271 | """ 272 | required_program_ids = set(required_program_ids or []) 273 | trigger_program_ids = set(trigger_program_ids or []) 274 | keywords = [kw.lower() for kw in (keywords or [])] 275 | 276 | candidate_mints = set(candidate_mints or []) 277 | signature_mints = signature_mints or {} 278 | 279 | if signature_programs is None: 280 | program_map: Dict[str, Set[str]] = {} 281 | else: 282 | program_map = signature_programs 283 | for sig, progs in list(program_map.items()): 284 | if not isinstance(progs, set): 285 | program_map[sig] = set(progs) 286 | 287 | found: Set[str] = set() 288 | pending_mints: Dict[str, Set[str]] = {} 289 | to_fetch: List[str] = [] 290 | to_fetch_set: Set[str] = set() 291 | 292 | for sig in tqdm(signatures, desc=f"Pre-scanning {tqdm_label}", leave=False): 293 | mints = set(signature_mints.get(sig, set())) 294 | if candidate_mints: 295 | mints = mints.intersection(candidate_mints) 296 | pending_mints[sig] = mints 297 | 298 | programs = program_map.get(sig) 299 | if programs is None: 300 | programs = set() 301 | program_map[sig] = programs 302 | 303 | has_required = bool(required_program_ids.intersection(programs)) if required_program_ids else False 304 | has_trigger = bool(trigger_program_ids.intersection(programs)) if trigger_program_ids else False 305 | 306 | if has_required or has_trigger: 307 | if mints: 308 | found.update(mints) 309 | continue 310 | # Need the transaction to determine mints; defer to RPC fetch 311 | if sig not in to_fetch_set: 312 | to_fetch_set.add(sig) 313 | to_fetch.append(sig) 314 | continue 315 | 316 | if not keywords: 317 | continue 318 | 319 | if sig not in to_fetch_set: 320 | to_fetch_set.add(sig) 321 | to_fetch.append(sig) 322 | 323 | if not to_fetch: 324 | return found 325 | 326 | workers = max(1, min(max_rpc_workers, len(to_fetch))) 327 | fetch_desc = f"{tqdm_label} RPC" 328 | 329 | with ThreadPoolExecutor(max_workers=workers) as pool: 330 | future_map = {pool.submit(get_transaction, sig): sig for sig in to_fetch} 331 | for fut in tqdm(as_completed(future_map), total=len(future_map), desc=fetch_desc, leave=False): 332 | sig = future_map[fut] 333 | try: 334 | tx = fut.result() 335 | except Exception: 336 | print(f"ERROR: failed to fetch signature {sig} during {tqdm_label} detection", file=sys.stderr) 337 | traceback.print_exc() 338 | continue 339 | if not tx: 340 | print(f"WARNING: getTransaction returned null for {sig}; skipping", file=sys.stderr) 341 | continue 342 | 343 | programs = program_map.setdefault(sig, set()) 344 | programs.update(_collect_program_ids(tx)) 345 | 346 | account_keys = set(_account_keys_as_strs(tx)) 347 | key_union = programs.union(account_keys) 348 | has_required = bool(required_program_ids.intersection(key_union)) if required_program_ids else False 349 | has_trigger = bool(trigger_program_ids.intersection(key_union)) if trigger_program_ids else False 350 | logs_text = _logs_concat_lower(tx) if keywords else "" 351 | has_keyword = any(kw in logs_text for kw in keywords) if keywords else False 352 | 353 | if not (has_required or has_trigger or has_keyword): 354 | continue 355 | 356 | mints = pending_mints.get(sig) 357 | if not mints: 358 | mints = _extract_mints_from_tx(tx) 359 | if candidate_mints: 360 | mints = mints.intersection(candidate_mints) 361 | if not mints: 362 | continue 363 | found.update(mints) 364 | 365 | return found 366 | 367 | # ---------- Core logic ---------- 368 | 369 | def tx_uses_program(enhanced_tx: Dict[str, Any], program_ids: Set[str]) -> bool: 370 | """Check top-level + inner instructions for any of the specified programs.""" 371 | if not program_ids: 372 | return False 373 | target = set(program_ids) 374 | for ins in enhanced_tx.get("instructions", []): 375 | pid = ins.get("programId") 376 | if pid in target: 377 | return True 378 | for inner in ins.get("innerInstructions", []) or []: 379 | if inner.get("programId") in target: 380 | return True 381 | return False 382 | 383 | 384 | def is_pump_tx(enhanced_tx: Dict[str, Any]) -> bool: 385 | """Check if any instruction (or inner) uses the pump.fun program.""" 386 | return tx_uses_program(enhanced_tx, {PUMP_FUN_PROGRAM_ID}) 387 | 388 | def native_change_for_wallet(enhanced_tx: Dict[str, Any], wallet: str) -> int: 389 | """Lamports delta for wallet in this tx (includes fees).""" 390 | for acct in enhanced_tx.get("accountData", []): 391 | if acct.get("account") == wallet: 392 | return int(acct.get("nativeBalanceChange") or 0) 393 | return 0 394 | 395 | def token_changes_for_wallet(enhanced_tx: Dict[str, Any], wallet: str) -> List[Dict[str, Any]]: 396 | out = [] 397 | for acct in enhanced_tx.get("accountData", []): 398 | for tbc in acct.get("tokenBalanceChanges", []) or []: 399 | if tbc.get("userAccount") == wallet: 400 | out.append(tbc) 401 | return out 402 | 403 | def classify_event(sol_delta_lamports: int, token_delta_raw: int) -> Optional[str]: 404 | """ 405 | BUY: token +, SOL -, SELL: token -, SOL + 406 | (Return None if inconsistent) 407 | """ 408 | if token_delta_raw > 0 and sol_delta_lamports < 0: 409 | return "BUY" 410 | if token_delta_raw < 0 and sol_delta_lamports > 0: 411 | return "SELL" 412 | # mixed signals (skip) 413 | return None 414 | 415 | def lamports_to_sol(lamports: int) -> float: 416 | return lamports / 1e9 417 | 418 | def parse_ts(e: Dict[str, Any]) -> int: 419 | return int(e.get("timestamp") or 0) 420 | 421 | def analyze_wallet(wallet: str, 422 | limit: int = 25000, 423 | days: Optional[int] = None, 424 | fetch_metadata: bool = False, 425 | max_mints: Optional[int] = None, 426 | raydium_launchpad_only: bool = False, 427 | include_raydium_launchpad: bool = False, 428 | cache_dir: Optional[str] = None, 429 | resume_cache: Optional[str] = None, 430 | helius_workers: int = 1) -> Tuple[pd.DataFrame, Dict[str, Any], Dict[str, Any]]: 431 | """ 432 | Returns: 433 | df (per-mint summary), stats dict, and metadata (e.g., per-mint decimals) 434 | """ 435 | cache = CacheManager(cache_dir) if cache_dir else None 436 | resume = CacheManager(resume_cache, read_only=True) if resume_cache else None 437 | 438 | if cache and not resume: 439 | print(f"Caching decoded data under {cache.root}") 440 | if resume: 441 | print(f"Resuming from cache {resume.root}") 442 | 443 | if raydium_launchpad_only or include_raydium_launchpad: 444 | program_filter_ids: Optional[Set[str]] = None 445 | else: 446 | program_filter_ids = {PUMP_FUN_PROGRAM_ID} 447 | 448 | token_events: List[Dict[str, Any]] = [] 449 | sig_list: List[str] = [] 450 | signature_program_ids: Dict[str, Set[str]] = {} 451 | raydium_launchpad_mints: Set[str] = set() 452 | mint_decimals: Dict[str, int] = {} 453 | 454 | if resume: 455 | cached_sig_list = resume.load_json("signatures.json") 456 | cached_events = resume.load_json("pump_events.json") 457 | meta = resume.load_json("cache_meta.json") 458 | if cached_sig_list is None or cached_events is None or meta is None: 459 | print("ERROR: resume cache missing required files (signatures.json, pump_events.json, cache_meta.json).", file=sys.stderr) 460 | sys.exit(3) 461 | if not meta.get("complete"): 462 | print("ERROR: resume cache not marked complete; re-run without --resume-cache or rebuild the cache.", file=sys.stderr) 463 | sys.exit(4) 464 | sig_list = cached_sig_list 465 | token_events = cached_events 466 | expected = meta.get("signature_count") 467 | if expected and expected != len(sig_list): 468 | print("ERROR: resume cache signature count mismatch.", file=sys.stderr) 469 | sys.exit(5) 470 | print(f"Loaded {len(sig_list)} signatures and {len(token_events)} cached token events.") 471 | 472 | cached_programs = resume.load_json("signature_programs.json") 473 | if cached_programs: 474 | signature_program_ids = {sig: set(progs) for sig, progs in cached_programs.items()} 475 | else: 476 | signature_program_ids = {} 477 | cache_root = resume.root 478 | if cache_root: 479 | batch_paths = sorted(Path(cache_root).glob("helius_batch_*.json")) 480 | for batch_path in tqdm(batch_paths, desc="Rebuilding program cache", leave=False): 481 | try: 482 | with batch_path.open("r", encoding="utf-8") as fh: 483 | txs = json.load(fh) 484 | except Exception: 485 | continue 486 | for tx in txs: 487 | sig = tx.get("signature") 488 | if not sig: 489 | continue 490 | signature_program_ids.setdefault(sig, set()).update(_collect_program_ids(tx)) 491 | if cache and signature_program_ids: 492 | cache.save_json("signature_programs.json", _serialize_program_map(signature_program_ids)) 493 | meta_update = resume.load_json("cache_meta.json") or {} 494 | meta_update["program_map_count"] = len(signature_program_ids) 495 | meta_update["updated"] = datetime.now(timezone.utc).isoformat() 496 | cache.save_json("cache_meta.json", meta_update) 497 | for evt in token_events: 498 | if "is_pumpfun_tx" not in evt: 499 | sig = evt.get("signature") 500 | programs = signature_program_ids.get(sig) or set() 501 | evt["is_pumpfun_tx"] = PUMP_FUN_PROGRAM_ID in programs 502 | for evt in token_events: 503 | mint = evt.get("mint") 504 | if not mint or mint in mint_decimals: 505 | continue 506 | dec_val = evt.get("decimals") 507 | try: 508 | mint_decimals[mint] = int(dec_val if dec_val is not None else 0) 509 | except (TypeError, ValueError): 510 | mint_decimals[mint] = 0 511 | else: 512 | # Step 1: collect signatures (fast path: use getSignaturesForAddress) 513 | # We’ll chunk them into 100s and call Helius /v0/transactions for decoded data. 514 | print("Fetching signatures…") 515 | signatures: List[Dict[str, Any]] = [] 516 | before_sig = None 517 | fetched = 0 518 | cutoff = days_ago_epoch(days) if days else None 519 | 520 | while fetched < limit: 521 | batch = get_signatures_for_address(wallet, before_sig, min(1000, limit - fetched)) 522 | if not batch: 523 | break 524 | signatures.extend(batch) 525 | fetched += len(batch) 526 | before_sig = batch[-1]["signature"] 527 | # Stop early if time cutoff 528 | if cutoff and (batch[-1].get("blockTime") and batch[-1]["blockTime"] < cutoff): 529 | break 530 | 531 | if not signatures: 532 | raise SystemExit("No signatures found for this wallet in the requested window.") 533 | 534 | # Keep latest->oldest order 535 | sig_list = [s["signature"] for s in signatures] 536 | print(f"Total signatures collected: {len(sig_list)}") 537 | if cache: 538 | cache.save_json("signatures.json", sig_list) 539 | cache.save_json( 540 | "cache_meta.json", 541 | { 542 | "complete": False, 543 | "signature_count": len(sig_list), 544 | "pump_events_count": 0, 545 | "program_map_count": 0, 546 | "updated": datetime.now(timezone.utc).isoformat(), 547 | }, 548 | ) 549 | 550 | # Step 2: decode with Helius (batch) 551 | if not HELIUS_API_KEY: 552 | print("WARNING: No HELIUS_API_KEY set. Falling back to raw RPC-only mode is not implemented here; please set a Helius key for enhanced decoding.") 553 | sys.exit(2) 554 | 555 | print("Decoding transactions via Helius Enhanced…") 556 | helius_workers = max(1, int(helius_workers or 1)) 557 | batch_size = 100 558 | total_batches = math.ceil(len(sig_list) / batch_size) 559 | 560 | with tqdm(total=total_batches, desc="Decoding", leave=False) as progress: 561 | 562 | def process_chunk_txs(batch_txs: List[Dict[str, Any]]) -> None: 563 | chunk_events: List[Dict[str, Any]] = [] 564 | for tx in batch_txs: 565 | sig = tx.get("signature") 566 | if sig: 567 | signature_program_ids.setdefault(sig, set()).update(_collect_program_ids(tx)) 568 | if program_filter_ids and not tx_uses_program(tx, program_filter_ids): 569 | continue 570 | 571 | ts = parse_ts(tx) 572 | sol_delta = native_change_for_wallet(tx, wallet) 573 | is_pumpfun_tx = tx_uses_program(tx, {PUMP_FUN_PROGRAM_ID}) 574 | 575 | changes = token_changes_for_wallet(tx, wallet) 576 | for c in changes: 577 | mint = c.get("mint") 578 | if not mint or mint == WSOL_MINT: 579 | continue 580 | raw_info = c.get("rawTokenAmount") or {} 581 | token_amt_str = raw_info.get("tokenAmount") or "0" 582 | try: 583 | token_delta_raw = int(token_amt_str) 584 | except ValueError: 585 | token_delta_raw = int(float(token_amt_str)) 586 | if token_delta_raw == 0: 587 | continue 588 | decimals = int(raw_info.get("decimals") or 0) 589 | side = classify_event(sol_delta, token_delta_raw) 590 | if not side: 591 | continue 592 | chunk_events.append({ 593 | "signature": tx.get("signature"), 594 | "timestamp": ts, 595 | "mint": mint, 596 | "decimals": decimals, 597 | "token_delta_raw": token_delta_raw, 598 | "sol_delta_lamports": sol_delta, 599 | "side": side, 600 | "is_pumpfun_tx": is_pumpfun_tx 601 | }) 602 | if mint not in mint_decimals: 603 | mint_decimals[mint] = decimals 604 | 605 | token_events.extend(chunk_events) 606 | if cache: 607 | cache.save_json("pump_events.json", token_events) 608 | cache.save_json("signature_programs.json", _serialize_program_map(signature_program_ids)) 609 | progress.update(1) 610 | 611 | missing_batches: List[Tuple[int, str, List[str]]] = [] 612 | 613 | for batch_idx in range(total_batches): 614 | start = batch_idx * batch_size 615 | chunk = sig_list[start:start + batch_size] 616 | cache_key = f"helius_batch_{batch_idx:06d}.json" 617 | cached_txs = cache.load_json(cache_key) if cache else None 618 | if cached_txs is not None: 619 | process_chunk_txs(cached_txs) 620 | continue 621 | missing_batches.append((batch_idx, cache_key, chunk)) 622 | 623 | if missing_batches: 624 | worker_count = min(len(missing_batches), helius_workers) 625 | if worker_count > 1: 626 | print(f"Helius decoding: using {worker_count} parallel workers…") 627 | 628 | if worker_count == 1: 629 | for batch_idx, cache_key, chunk in missing_batches: 630 | txs = helius_get_transactions_batch(chunk) 631 | if cache: 632 | cache.save_json(cache_key, txs) 633 | process_chunk_txs(txs) 634 | else: 635 | with ThreadPoolExecutor(max_workers=worker_count) as pool: 636 | future_map = { 637 | pool.submit(helius_get_transactions_batch, chunk): (batch_idx, cache_key, chunk) 638 | for batch_idx, cache_key, chunk in missing_batches 639 | } 640 | for fut in as_completed(future_map): 641 | batch_idx, cache_key, chunk = future_map[fut] 642 | try: 643 | txs = fut.result() 644 | except Exception as exc: 645 | print(f"ERROR: failed to fetch Helius batch {batch_idx}: {exc}", file=sys.stderr) 646 | try: 647 | txs = helius_get_transactions_batch(chunk) 648 | except Exception as retry_exc: 649 | print(f"ERROR: retry failed for Helius batch {batch_idx}: {retry_exc}", file=sys.stderr) 650 | progress.update(1) 651 | continue 652 | if cache: 653 | cache.save_json(cache_key, txs) 654 | process_chunk_txs(txs) 655 | 656 | if cache: 657 | cache.save_json( 658 | "cache_meta.json", 659 | { 660 | "complete": True, 661 | "signature_count": len(sig_list), 662 | "pump_events_count": len(token_events), 663 | "program_map_count": len(signature_program_ids), 664 | "updated": datetime.now(timezone.utc).isoformat(), 665 | }, 666 | ) 667 | print( 668 | f"Cache updated ({len(sig_list)} signatures, {len(token_events)} token events) → {cache.root}" 669 | ) 670 | 671 | if not token_events: 672 | raise SystemExit("No relevant token balance changes for this wallet were found (in the scanned history).") 673 | 674 | # Step 3: group per mint and compute metrics 675 | df = pd.DataFrame(token_events) 676 | if "is_pumpfun_tx" not in df.columns: 677 | df["is_pumpfun_tx"] = df["signature"].map( 678 | lambda s: PUMP_FUN_PROGRAM_ID in (signature_program_ids.get(s) or set()) 679 | ) 680 | signature_to_mints: Dict[str, Set[str]] = {} 681 | mint_first_buy_signature: Dict[str, str] = {} 682 | for evt in token_events: 683 | sig = evt.get("signature") 684 | mint = evt.get("mint") 685 | if sig and mint: 686 | signature_to_mints.setdefault(sig, set()).add(mint) 687 | # Keep only BUY/SELL rows 688 | df = df[df["side"].isin(["BUY", "SELL"])] 689 | 690 | # Aggregate per mint 691 | rows = [] 692 | analyzed_mints = 0 693 | 694 | for mint, g in df.sort_values("timestamp").groupby("mint"): 695 | analyzed_mints += 1 696 | if max_mints and analyzed_mints > max_mints: 697 | break 698 | 699 | # First BUY by this wallet 700 | gb = g[g["side"] == "BUY"].sort_values("timestamp") 701 | gs = g[g["side"] == "SELL"].sort_values("timestamp") 702 | 703 | if gb.empty: 704 | continue 705 | 706 | first_buy = gb.iloc[0] 707 | first_buy_ts = int(first_buy["timestamp"]) 708 | if isinstance(first_buy.get("signature"), str): 709 | mint_first_buy_signature[mint] = first_buy["signature"] 710 | 711 | # First SELL strictly after first BUY 712 | first_sell_ts = None 713 | if not gs.empty: 714 | after = gs[gs["timestamp"] > first_buy_ts] 715 | if not after.empty: 716 | first_sell_ts = int(after.iloc[0]["timestamp"]) 717 | 718 | delay_seconds = (first_sell_ts - first_buy_ts) if first_sell_ts else None 719 | 720 | # Buy/Sell totals (SOL deltas are per-tx net for the wallet, so sum for PnL incl. fees) 721 | lamports_list = g["sol_delta_lamports"].tolist() 722 | net_profit_sol = lamports_to_sol(sum(lamports_list)) 723 | 724 | sol_spent_total = lamports_to_sol(-sum([x for x in g["sol_delta_lamports"] if x < 0])) 725 | sol_received_total = lamports_to_sol(sum([x for x in g["sol_delta_lamports"] if x > 0])) 726 | 727 | first_buy_sol = lamports_to_sol(-first_buy["sol_delta_lamports"]) if first_buy["sol_delta_lamports"] < 0 else 0.0 728 | sells_count = int((g["side"] == "SELL").sum()) 729 | is_pumpfun_mint = bool(g["is_pumpfun_tx"].any()) 730 | 731 | rows.append({ 732 | "mint": mint, 733 | "first_buy_ts": first_buy_ts, 734 | "first_sell_ts": first_sell_ts, 735 | "delay_seconds": delay_seconds, 736 | "sells_count": sells_count, 737 | "first_buy_sol": round(first_buy_sol, 6), 738 | "sol_spent_total": round(sol_spent_total, 6), 739 | "sol_received_total": round(sol_received_total, 6), 740 | "net_profit_sol": round(net_profit_sol, 6), 741 | "is_pumpfun_mint": is_pumpfun_mint 742 | }) 743 | 744 | out_df = pd.DataFrame(rows).sort_values(["first_buy_ts"]).reset_index(drop=True) 745 | if (raydium_launchpad_only or include_raydium_launchpad) and not out_df.empty: 746 | candidate = set(out_df["mint"].unique()) 747 | base_signatures = [mint_first_buy_signature.get(m) for m in candidate if mint_first_buy_signature.get(m)] 748 | if not base_signatures: 749 | base_signatures = list(signature_to_mints.keys()) or sig_list 750 | if not base_signatures: 751 | print("WARNING: No signatures available for Raydium launchpad detection; skipping filter.") 752 | else: 753 | print(f"Raydium launchpad detection: scanning {len(base_signatures)} signatures…") 754 | raydium_launchpad_mints = detect_launchpad_mints( 755 | wallet, 756 | base_signatures, 757 | candidate_mints=candidate, 758 | signature_mints=signature_to_mints, 759 | signature_programs=signature_program_ids, 760 | required_program_ids=RAYDIUM_LAUNCHPAD_PROGRAM_IDS, 761 | trigger_program_ids=RAYDIUM_LAUNCHPAD_TRIGGER_PROGRAM_IDS, 762 | keywords=RAYDIUM_LAUNCHPAD_KEYWORDS, 763 | tqdm_label="raydium launchpad", 764 | max_rpc_workers=8, 765 | ) 766 | print(f"Raydium launchpad detection found {len(raydium_launchpad_mints)} mint(s).") 767 | out_df["is_raydium_launchpad"] = out_df["mint"].isin(raydium_launchpad_mints) 768 | 769 | if raydium_launchpad_only and not out_df.empty: 770 | before_count = len(out_df) 771 | out_df = out_df[out_df["is_raydium_launchpad"]].reset_index(drop=True) 772 | print(f"Raydium launchpad filter kept {len(out_df)}/{before_count} mints.") 773 | if out_df.empty: 774 | print("No Raydium Launchpad-created mints detected for this wallet in the scanned window.") 775 | elif include_raydium_launchpad and not out_df.empty: 776 | before_count = len(out_df) 777 | mask = out_df["is_pumpfun_mint"] | out_df["is_raydium_launchpad"] 778 | out_df = out_df[mask].reset_index(drop=True) 779 | pump_count = int(out_df["is_pumpfun_mint"].sum()) 780 | ray_only = int((out_df["is_raydium_launchpad"] & ~out_df["is_pumpfun_mint"]).sum()) 781 | print( 782 | f"Combined pump.fun + Raydium launchpad kept {len(out_df)}/{before_count} mints " 783 | f"({pump_count} pump.fun, {ray_only} Raydium-only)." 784 | ) 785 | else: 786 | if not out_df.empty: 787 | before_count = len(out_df) 788 | out_df = out_df[out_df["is_pumpfun_mint"]].reset_index(drop=True) 789 | if len(out_df) != before_count: 790 | print(f"Filtered out {before_count - len(out_df)} non-pump.fun mints.") 791 | 792 | # Step 4: optional metadata fetch (name/symbol) 793 | if fetch_metadata and not out_df.empty: 794 | print("Fetching token metadata (name/symbol) via DAS…") 795 | names = {} 796 | for m in tqdm(out_df["mint"].tolist(), desc="Metadata"): 797 | info = helius_get_asset(m) or {} 798 | content = info.get("content") or {} 799 | md = content.get("metadata") or {} 800 | name = md.get("name") 801 | symbol = md.get("symbol") 802 | names[m] = (name, symbol) 803 | out_df["name"] = out_df["mint"].map(lambda m: (names.get(m) or (None, None))[0]) 804 | out_df["symbol"] = out_df["mint"].map(lambda m: (names.get(m) or (None, None))[1]) 805 | 806 | # Step 5: summaries & correlations 807 | stats = {} 808 | if not out_df.empty: 809 | recent = out_df.dropna(subset=["delay_seconds"]) 810 | stats["count_with_sell"] = int(len(recent)) 811 | stats["sell_within_60s"] = int((recent["delay_seconds"] <= 60).sum()) 812 | stats["sell_60_120s"] = int(((recent["delay_seconds"] > 60) & (recent["delay_seconds"] <= 120)).sum()) 813 | stats["sell_gt_120s"] = int((recent["delay_seconds"] > 120).sum()) 814 | stats["median_delay_s"] = float(recent["delay_seconds"].median()) if not recent.empty else None 815 | stats["median_net_profit_sol"] = float(out_df["net_profit_sol"].median()) if not out_df.empty else None 816 | # Correlations 817 | try: 818 | pear = pearsonr(recent["delay_seconds"], recent["net_profit_sol"]) 819 | spear = spearmanr(recent["delay_seconds"], recent["net_profit_sol"]) 820 | stats["pearson_r_delay_vs_profit"] = float(pear.statistic) if hasattr(pear, "statistic") else float(pear[0]) 821 | stats["spearman_r_delay_vs_profit"] = float(spear.statistic) if hasattr(spear, "statistic") else float(spear[0]) 822 | except Exception: 823 | stats["pearson_r_delay_vs_profit"] = None 824 | stats["spearman_r_delay_vs_profit"] = None 825 | 826 | # Two-batch sells detection 827 | stats["two_batch_sell_count"] = int((out_df["sells_count"] == 2).sum()) 828 | 829 | if cache and signature_program_ids: 830 | cache.save_json("signature_programs.json", _serialize_program_map(signature_program_ids)) 831 | meta_update = cache.load_json("cache_meta.json") or {} 832 | meta_update["program_map_count"] = len(signature_program_ids) 833 | meta_update["updated"] = datetime.now(timezone.utc).isoformat() 834 | cache.save_json("cache_meta.json", meta_update) 835 | 836 | metadata: Dict[str, Any] = { 837 | "mint_decimals": mint_decimals, 838 | "raydium_launchpad_mints": sorted(raydium_launchpad_mints), 839 | } 840 | 841 | return out_df, stats, metadata 842 | 843 | 844 | def _extract_account_owner_map(enhanced_tx: Dict[str, Any], mint: str) -> Dict[str, str]: 845 | """Build map of token account -> owner for the specified mint.""" 846 | owners: Dict[str, str] = {} 847 | for acct in enhanced_tx.get("accountData", []) or []: 848 | for tbc in acct.get("tokenBalanceChanges", []) or []: 849 | if tbc.get("mint") != mint: 850 | continue 851 | token_account = tbc.get("tokenAccount") 852 | user_account = tbc.get("userAccount") 853 | if token_account and user_account and token_account not in owners: 854 | owners[token_account] = user_account 855 | return owners 856 | 857 | 858 | def _mint_transfer_rows_from_tx(enhanced_tx: Dict[str, Any], 859 | mint: str, 860 | creator_wallet: str) -> List[Dict[str, Any]]: 861 | """Extract pump.fun mint transfer rows involving this mint.""" 862 | rows: List[Dict[str, Any]] = [] 863 | transfers = enhanced_tx.get("tokenTransfers") or [] 864 | if not transfers: 865 | return rows 866 | 867 | sig = enhanced_tx.get("signature") 868 | ts = parse_ts(enhanced_tx) 869 | owner_map = _extract_account_owner_map(enhanced_tx, mint) 870 | native_transfers = enhanced_tx.get("nativeTransfers") or [] 871 | 872 | def _native_totals(user: Optional[str]) -> Tuple[int, int]: 873 | if not user: 874 | return 0, 0 875 | outgoing = sum(int(nt.get("amount") or 0) for nt in native_transfers if nt.get("fromUserAccount") == user) 876 | incoming = sum(int(nt.get("amount") or 0) for nt in native_transfers if nt.get("toUserAccount") == user) 877 | return outgoing, incoming 878 | 879 | for tf in transfers: 880 | if tf.get("mint") != mint: 881 | continue 882 | token_amount_raw = tf.get("tokenAmount") 883 | if token_amount_raw is None: 884 | continue 885 | try: 886 | token_amount = float(token_amount_raw) 887 | except (TypeError, ValueError): 888 | try: 889 | token_amount = float(str(token_amount_raw)) 890 | except Exception: 891 | continue 892 | from_user = tf.get("fromUserAccount") 893 | to_user = tf.get("toUserAccount") 894 | from_token_account = tf.get("fromTokenAccount") 895 | to_token_account = tf.get("toTokenAccount") 896 | 897 | resolved_from_user = from_user 898 | resolved_to_user = to_user 899 | 900 | if not resolved_from_user and from_token_account: 901 | resolved_from_user = owner_map.get(from_token_account) 902 | if not resolved_to_user and to_token_account: 903 | resolved_to_user = owner_map.get(to_token_account) 904 | 905 | from_out, from_in = _native_totals(resolved_from_user) 906 | to_out, to_in = _native_totals(resolved_to_user) 907 | 908 | rows.append({ 909 | "mint": mint, 910 | "signature": sig, 911 | "timestamp": ts, 912 | "from_user": from_user, 913 | "to_user": to_user, 914 | "from_user_resolved": resolved_from_user, 915 | "to_user_resolved": resolved_to_user, 916 | "from_token_account": from_token_account, 917 | "to_token_account": to_token_account, 918 | "token_amount": token_amount, 919 | "token_amount_abs": abs(token_amount), 920 | "token_direction": "out" if token_amount < 0 else "in", 921 | "sol_from_user_spent": lamports_to_sol(from_out), 922 | "sol_from_user_received": lamports_to_sol(from_in), 923 | "sol_to_user_spent": lamports_to_sol(to_out), 924 | "sol_to_user_received": lamports_to_sol(to_in), 925 | "is_creator_involved": bool(resolved_from_user == creator_wallet or resolved_to_user == creator_wallet), 926 | }) 927 | return rows 928 | 929 | 930 | def _unique_preserve(items: List[str]) -> List[str]: 931 | """Return list without duplicates while preserving order.""" 932 | seen: Set[str] = set() 933 | out: List[str] = [] 934 | for item in items: 935 | if item in seen: 936 | continue 937 | seen.add(item) 938 | out.append(item) 939 | return out 940 | 941 | 942 | def collect_mint_transfers(mint: str, 943 | creator_wallet: str, 944 | signature_limit: Optional[int] = None) -> List[Dict[str, Any]]: 945 | """ 946 | Fetch decoded transactions touching the mint and pull token transfer rows that 947 | involve the mint. Returns a list of dict rows suitable for CSV export. 948 | """ 949 | if signature_limit is not None and signature_limit < 0: 950 | return [] 951 | 952 | collected: List[Dict[str, Any]] = [] 953 | signatures: List[str] = [] 954 | before_sig: Optional[str] = None 955 | 956 | def _remaining() -> Optional[int]: 957 | if signature_limit is None or signature_limit == 0: 958 | return None 959 | return max(signature_limit - len(signatures), 0) 960 | 961 | while True: 962 | remaining = _remaining() 963 | if remaining is not None and remaining <= 0: 964 | break 965 | fetch_limit = 1000 if remaining is None else min(1000, remaining) 966 | batch = get_signatures_for_address(mint, before_sig, fetch_limit) 967 | if not batch: 968 | break 969 | signatures.extend([b["signature"] for b in batch if b.get("signature")]) 970 | before_sig = batch[-1]["signature"] 971 | if len(batch) < fetch_limit: 972 | break 973 | 974 | if not signatures: 975 | return collected 976 | 977 | unique_sigs = _unique_preserve(signatures) 978 | 979 | for start in range(0, len(unique_sigs), 100): 980 | chunk = unique_sigs[start:start + 100] 981 | txs = helius_get_transactions_batch(chunk) 982 | for tx in txs: 983 | collected.extend(_mint_transfer_rows_from_tx(tx, mint, creator_wallet)) 984 | 985 | return collected 986 | 987 | 988 | def export_mint_tx_csv(mint: str, 989 | creator_wallet: str, 990 | output_dir: Path, 991 | signature_limit: Optional[int] = None, 992 | token_decimals: Optional[int] = None) -> Tuple[Optional[Path], int]: 993 | """ 994 | Collect mint token transfers and persist them as CSV. 995 | Returns (path, row_count); path is None if nothing was exported. 996 | """ 997 | output_dir.mkdir(parents=True, exist_ok=True) 998 | rows = collect_mint_transfers(mint, creator_wallet, signature_limit=signature_limit) 999 | if not rows: 1000 | return None, 0 1001 | 1002 | rows.sort(key=lambda r: (r.get("timestamp") or 0, str(r.get("signature") or ""))) 1003 | df = pd.DataFrame(rows) 1004 | df["time_utc"] = df["timestamp"].map(human_time) 1005 | if token_decimals is not None: 1006 | df["token_decimals"] = token_decimals 1007 | columns = [ 1008 | "mint", 1009 | "signature", 1010 | "timestamp", 1011 | "time_utc", 1012 | "from_user", 1013 | "from_user_resolved", 1014 | "to_user", 1015 | "to_user_resolved", 1016 | "from_token_account", 1017 | "to_token_account", 1018 | "token_amount", 1019 | "token_amount_abs", 1020 | "token_direction", 1021 | "token_decimals", 1022 | "sol_from_user_spent", 1023 | "sol_from_user_received", 1024 | "sol_to_user_spent", 1025 | "sol_to_user_received", 1026 | "is_creator_involved", 1027 | ] 1028 | cols_present = [c for c in columns if c in df.columns] 1029 | 1030 | out_path = output_dir / f"{sanitize_for_fs(mint)}_mint_txs.csv" 1031 | df[cols_present].to_csv(out_path, index=False) 1032 | return out_path, len(df) 1033 | 1034 | # ---------- CLI ---------- 1035 | 1036 | def human_time(ts: Optional[Any]) -> str: 1037 | if ts is None or pd.isna(ts): 1038 | return "-" 1039 | 1040 | if isinstance(ts, datetime): 1041 | dt = ts.astimezone(timezone.utc) 1042 | else: 1043 | try: 1044 | ts_float = float(ts) 1045 | except (TypeError, ValueError): 1046 | return "-" 1047 | 1048 | if math.isnan(ts_float) or math.isinf(ts_float): 1049 | return "-" 1050 | 1051 | try: 1052 | dt = datetime.fromtimestamp(ts_float, tz=timezone.utc) 1053 | except (ValueError, OSError): 1054 | return "-" 1055 | 1056 | return dt.strftime("%Y-%m-%d %H:%M:%S UTC") 1057 | 1058 | def main(): 1059 | ap = argparse.ArgumentParser() 1060 | ap.add_argument("--wallet", required=True, help="Dev wallet address to analyze") 1061 | ap.add_argument("--limit", type=int, default=25000, help="Max signatures to scan") 1062 | ap.add_argument("--days", type=int, default=None, help="Only scan last N days") 1063 | ap.add_argument("--fetch-metadata", action="store_true", help="Fetch token name/symbol via DAS (slower)") 1064 | ap.add_argument("--max-mints", type=int, default=None, help="Limit number of mints processed for testing") 1065 | ap.add_argument("--raydium-launchpad", action="store_true", 1066 | help="Restrict to coins this wallet launched on Raydium Launchpad (best-effort log detection)") 1067 | ap.add_argument("--include-raydium-launchpad", action="store_true", 1068 | help="Include Raydium Launchpad mints in addition to pump.fun results") 1069 | ap.add_argument("--cache-dir", type=str, default=None, 1070 | help="Directory to persist decoded signatures/transactions for reuse (default: mode-specific cache)") 1071 | ap.add_argument("--resume-cache", type=str, default=None, 1072 | help="Path to a cache directory created by a previous run (skips re-decoding)") 1073 | ap.add_argument("--export-mint-txs", action="store_true", 1074 | help="Export per-mint transaction CSVs capturing all token transfers for each mint") 1075 | ap.add_argument("--mint-tx-limit", type=int, default=None, dest="mint_tx_limit", 1076 | help="Max signatures per mint when exporting mint transaction CSVs (0 => unlimited)") 1077 | ap.add_argument("--sniper-limit", type=int, dest="mint_tx_limit", help=argparse.SUPPRESS) 1078 | ap.add_argument("--mint-tx-output-dir", type=str, default=None, dest="mint_tx_output_dir", 1079 | help="Directory to write per-mint mint transaction CSVs (default: mint_txs/)") 1080 | ap.add_argument("--sniper-output-dir", type=str, dest="mint_tx_output_dir", help=argparse.SUPPRESS) 1081 | default_helius_workers = max(1, min(4, os.cpu_count() or 1)) 1082 | ap.add_argument("--helius-workers", type=int, default=default_helius_workers, 1083 | help="Concurrent Helius batch requests (default: %(default)s). Use 1 to disable parallel decoding.") 1084 | args = ap.parse_args() 1085 | 1086 | if args.raydium_launchpad and args.include_raydium_launchpad: 1087 | ap.error("Use either --raydium-launchpad or --include-raydium-launchpad, not both.") 1088 | 1089 | mode_suffix = ( 1090 | "raydium_launchpad" 1091 | if args.raydium_launchpad 1092 | else "pumpfun_plus_raydium" 1093 | if args.include_raydium_launchpad 1094 | else "pumpfun" 1095 | ) 1096 | default_cache_root = Path(".cache") / f"{sanitize_for_fs(args.wallet)}_{mode_suffix}" 1097 | 1098 | cache_dir_path = Path(args.cache_dir).expanduser() if args.cache_dir else default_cache_root 1099 | resume_cache_path: Optional[Path] = Path(args.resume_cache).expanduser() if args.resume_cache else None 1100 | 1101 | if resume_cache_path is None: 1102 | meta_path = cache_dir_path / "cache_meta.json" 1103 | if meta_path.exists(): 1104 | try: 1105 | with meta_path.open("r", encoding="utf-8") as fh: 1106 | cached_meta = json.load(fh) 1107 | except Exception: 1108 | cached_meta = None 1109 | if cached_meta and cached_meta.get("complete"): 1110 | resume_cache_path = cache_dir_path 1111 | elif not args.cache_dir: 1112 | # Backwards compatibility: attempt to auto-detect legacy cache without suffix 1113 | legacy_root = Path(".cache") / sanitize_for_fs(args.wallet) 1114 | legacy_meta = legacy_root / "cache_meta.json" 1115 | if legacy_meta.exists(): 1116 | try: 1117 | with legacy_meta.open("r", encoding="utf-8") as fh: 1118 | cached_meta = json.load(fh) 1119 | except Exception: 1120 | cached_meta = None 1121 | if cached_meta and cached_meta.get("complete"): 1122 | print(f"Legacy cache detected at {legacy_root}; using it for this run.") 1123 | resume_cache_path = legacy_root 1124 | cache_dir_path = legacy_root 1125 | 1126 | df, stats, metadata = analyze_wallet( 1127 | wallet=args.wallet, 1128 | limit=args.limit, 1129 | days=args.days, 1130 | fetch_metadata=args.fetch_metadata, 1131 | max_mints=args.max_mints, 1132 | raydium_launchpad_only=args.raydium_launchpad, 1133 | include_raydium_launchpad=args.include_raydium_launchpad, 1134 | cache_dir=str(cache_dir_path), 1135 | resume_cache=str(resume_cache_path) if resume_cache_path else None, 1136 | helius_workers=args.helius_workers, 1137 | ) 1138 | 1139 | if df.empty: 1140 | print("\nNo pump.fun BUY/SELL activity detected for this wallet.") 1141 | return 1142 | 1143 | # Save CSV 1144 | csv_suffix = mode_suffix 1145 | csv_path = f"{args.wallet}_{csv_suffix}.csv" 1146 | df_out = df.copy() 1147 | # Add human-readable times 1148 | df_out["first_buy_time"] = df_out["first_buy_ts"].map(human_time) 1149 | df_out["first_sell_time"] = df_out["first_sell_ts"].map(human_time) 1150 | cols = [ 1151 | "mint", 1152 | "name", 1153 | "symbol", 1154 | "first_buy_time", 1155 | "first_sell_time", 1156 | "delay_seconds", 1157 | "sells_count", 1158 | "first_buy_sol", 1159 | "sol_spent_total", 1160 | "sol_received_total", 1161 | "net_profit_sol", 1162 | "is_pumpfun_mint", 1163 | "is_raydium_launchpad", 1164 | ] 1165 | cols = [c for c in cols if c in df_out.columns] 1166 | df_out[cols].to_csv(csv_path, index=False) 1167 | print(f"\n✔ Saved per-mint results → {csv_path}") 1168 | 1169 | mint_decimals_map = metadata.get("mint_decimals", {}) if isinstance(metadata, dict) else {} 1170 | mint_tx_limit_raw = args.mint_tx_limit 1171 | export_mint_txs = bool(args.export_mint_txs) 1172 | mint_tx_limit: Optional[int] 1173 | 1174 | if mint_tx_limit_raw is None: 1175 | mint_tx_limit = None 1176 | elif mint_tx_limit_raw < 0: 1177 | mint_tx_limit = None 1178 | export_mint_txs = False 1179 | elif mint_tx_limit_raw == 0: 1180 | mint_tx_limit = None 1181 | export_mint_txs = True 1182 | else: 1183 | mint_tx_limit = mint_tx_limit_raw 1184 | export_mint_txs = True 1185 | 1186 | if export_mint_txs and not df.empty: 1187 | limit_text = "no limit" if mint_tx_limit is None else f"limit {mint_tx_limit}" 1188 | mint_tx_root = Path(args.mint_tx_output_dir).expanduser() if args.mint_tx_output_dir else Path("mint_txs") / sanitize_for_fs(args.wallet) 1189 | print(f"\nExporting mint transaction CSVs ({limit_text} per mint)…") 1190 | for mint in df["mint"].tolist(): 1191 | try: 1192 | out_path, row_count = export_mint_tx_csv( 1193 | mint, 1194 | args.wallet, 1195 | mint_tx_root, 1196 | signature_limit=mint_tx_limit, 1197 | token_decimals=mint_decimals_map.get(mint), 1198 | ) 1199 | except Exception as exc: 1200 | print(f" {mint}: ERROR exporting mint tx data ({exc})", file=sys.stderr) 1201 | continue 1202 | if out_path: 1203 | print(f" {mint}: wrote {row_count} rows → {out_path}") 1204 | else: 1205 | print(f" {mint}: no qualifying token transfers found (within signature criteria).") 1206 | 1207 | # Print a compact summary 1208 | print("\n===== SUMMARY =====") 1209 | print(f"Total mints analyzed: {len(df)} (unique: {df['mint'].nunique()})") 1210 | if stats: 1211 | print(f"Mints with at least one SELL: {stats.get('count_with_sell', 0)}") 1212 | print(f"SELL delay: ≤60s={stats.get('sell_within_60s', 0)}, 60–120s={stats.get('sell_60_120s', 0)}, >120s={stats.get('sell_gt_120s', 0)}") 1213 | print(f"Median delay (s): {stats.get('median_delay_s')}") 1214 | print(f"Median net profit (SOL): {stats.get('median_net_profit_sol')}") 1215 | print(f"Two-batch sells (sells_count==2): {stats.get('two_batch_sell_count', 0)}") 1216 | print("Correlation delay vs net profit (SOL):") 1217 | print(f" Pearson r: {stats.get('pearson_r_delay_vs_profit')}") 1218 | print(f" Spearman r: {stats.get('spearman_r_delay_vs_profit')}") 1219 | if "is_raydium_launchpad" in df_out.columns: 1220 | pump_col = df_out.get("is_pumpfun_mint") 1221 | pump_count = int(pump_col.sum()) if pump_col is not None else 0 1222 | ray_col = df_out["is_raydium_launchpad"] 1223 | ray_count = int(ray_col.sum()) 1224 | ray_only = int((ray_col & (~pump_col if pump_col is not None else True)).sum()) if ray_count else 0 1225 | if args.raydium_launchpad: 1226 | print(f"Raydium Launchpad mints: {ray_count}") 1227 | elif args.include_raydium_launchpad: 1228 | print(f"pump.fun mints: {pump_count}, Raydium launchpad mints: {ray_count} (Raydium-only: {ray_only})") 1229 | 1230 | # Show a few examples (top/bottom by net profit) 1231 | top = df_out.sort_values("net_profit_sol", ascending=False).head(5) 1232 | low = df_out.sort_values("net_profit_sol", ascending=True).head(5) 1233 | def short(m): 1234 | return f"{m[:4]}…{m[-4:]}" if isinstance(m, str) else m 1235 | print("\nTop 5 by net profit (SOL):") 1236 | for _, r in top.iterrows(): 1237 | print(f" {short(r.get('symbol') or r.get('name') or r['mint'])} +{r['net_profit_sol']} SOL delay={r['delay_seconds']}s sells={r['sells_count']}") 1238 | print("\nBottom 5 by net profit (SOL):") 1239 | for _, r in low.iterrows(): 1240 | print(f" {short(r.get('symbol') or r.get('name') or r['mint'])} {r['net_profit_sol']} SOL delay={r['delay_seconds']}s sells={r['sells_count']}") 1241 | 1242 | if __name__ == "__main__": 1243 | main() 1244 | --------------------------------------------------------------------------------