├── .gitignore ├── README.md └── cli.py /.gitignore: -------------------------------------------------------------------------------- 1 | # macOS system files 2 | .DS_Store 3 | .AppleDouble 4 | .LSOverride 5 | Icon 6 | ._* 7 | .DocumentRevisions-V100 8 | .fseventsd 9 | .Spotlight-V100 10 | .TemporaryItems 11 | .Trashes 12 | .VolumeIcon.icns 13 | .com.apple.timemachine.donotpresent 14 | .AppleDB 15 | .AppleDesktop 16 | Network Trash Folder 17 | Temporary Items 18 | .apdisk 19 | 20 | # Python 21 | __pycache__/ 22 | *.py[cod] 23 | *$py.class 24 | *.so 25 | .Python 26 | build/ 27 | develop-eggs/ 28 | dist/ 29 | downloads/ 30 | eggs/ 31 | .eggs/ 32 | lib/ 33 | lib64/ 34 | parts/ 35 | sdist/ 36 | var/ 37 | wheels/ 38 | *.egg-info/ 39 | .installed.cfg 40 | *.egg 41 | MANIFEST 42 | .env 43 | .venv 44 | env/ 45 | venv/ 46 | ENV/ 47 | env.bak/ 48 | venv.bak/ 49 | .python-version 50 | .pytest_cache/ 51 | .coverage 52 | htmlcov/ 53 | 54 | # IDEs and editors 55 | .idea/ 56 | .vscode/ 57 | *.swp 58 | *.swo 59 | *~ 60 | .project 61 | .classpath 62 | .settings/ 63 | *.sublime-workspace 64 | *.sublime-project 65 | 66 | # Project-specific files 67 | *.txt 68 | *.log 69 | *.spr 70 | *.spr.txt 71 | 72 | # Keep specific text files 73 | !requirements.txt 74 | !.env.example 75 | !LICENSE.txt 76 | !README.txt 77 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # codeiumSPR 2 | 3 | Because AI Assistants Shouldn't Have Goldfish Memory 4 | 5 | A specialized tool for optimizing chat history context in Windsurf IDE. 6 | 7 | This tool processes chat logs into a Semantic Parsing Record 8 | (SPR) format, enabling better context retention and understanding across coding 9 | sessions. 10 | 11 | `` 12 | 13 | ## Hey Kids! 14 | 15 | Ever had your AI coding assistant forget what you were working on faster than 16 | you forget your keys? Say hello to codeiumSPR :) 17 | 18 | Using my super-duper-not-a-pooper-scooper Semantic Parsing Record (SPR) script, 19 | you can: 20 | 21 | - Bring the agent up to speed quicker than your cat knocking stuff off tables 22 | (unlike your ex) 23 | - Keep context around longer than your New Year's resolutions 24 | - Parse chat histories smoother than your pickup lines 25 | 26 | But wait, there's more! Act now and you'll get: 27 | 28 | - A new session that doesn't ask "wait, what were we doing again?" 29 | - A context window larger than your grandma's bumbum 30 | - Error tracking better than your excuses for missed deadlines 31 | 32 | P.S. Side effects may include: actually finishing your projects, fewer face-palm 33 | moments, a suspicious amount of productive coding sessions, explosive diarrhea, 34 | nausea, headaches, spontaneous combustion, and an inexplicable urge to 35 | high-five your rubber duck. 36 | 37 | P.P.S. No goldfish were harmed in the making of this neat little script. They 38 | just helped with the memory testing. 🐠 39 | 40 | 41 | ## Features 42 | 43 | - Semantic parsing of chat histories 44 | - Problem and solution linking 45 | - State change tracking 46 | - Dependency analysis 47 | - Error context preservation 48 | 49 | ## Installation 50 | 51 | ```bash 52 | # Clone the repository 53 | git clone https://github.com/claytondukes/codeiumSPR.git 54 | 55 | # Navigate to the directory 56 | cd codeiumSPR 57 | ``` 58 | 59 | ## Usage 60 | 61 | 1. copy/paste your entire chat history to a text file 62 | 2. Run the parser: 63 | 64 | ```bash 65 | python cli.py chat_history.txt 66 | ``` 67 | 68 | The tool will generate a `chat_history.spr.txt` file containing the optimized 69 | context. 70 | 71 | On your next session, tell it to read the file(s) and you're good to go! 72 | 73 | Example: 74 | 75 | ```bash 76 | check the following for our session histories. in order of oldest to newest: 77 | @history.spr.txt @history2.spr.txt @history3.spr.txt @history4.spr.txt @history5.spr.txt 78 | Then read all @docs 79 | ``` 80 | 81 | ## SPR Format 82 | 83 | The Semantic Parsing Record (SPR) uses the following structure: 84 | 85 | ### Metadata Markers 86 | 87 | - `#T`: Timestamp (ISO 8601 format) 88 | - `#S`: Session type (DEBUG, FEATURE, REFACTOR, etc.) 89 | - `#I`: Main issue or task 90 | 91 | ### Context Blocks 92 | 93 | ```text 94 | @CONTEXT{ 95 | "issues": [ 96 | { 97 | "type": "category", 98 | "summary": "Concise problem description", 99 | "details": ["Detailed information"], 100 | "context": {"relevant": "metadata"} 101 | } 102 | ], 103 | "reasoning": ["Chain of thought"] 104 | } 105 | ``` 106 | 107 | ### Change Tracking 108 | 109 | ```text 110 | @CHANGES{ 111 | "file_path": ["modifications"] 112 | } 113 | ``` 114 | 115 | ### State Management 116 | 117 | ```text 118 | @STATE{ 119 | "dependencies": ["required components"], 120 | "changes": ["state modifications"] 121 | } 122 | ``` 123 | 124 | ### Error Context 125 | 126 | ```text 127 | @ERRORS{ 128 | "type": "error_category", 129 | "discussion": { 130 | "analysis": ["investigation steps"], 131 | "solutions": ["proposed fixes"] 132 | } 133 | } 134 | ``` 135 | 136 | ## Development 137 | 138 | ### Prerequisites 139 | 140 | - Python 3.8+ 141 | - pip package manager 142 | - Git 143 | 144 | ## Contributing 145 | 146 | 1. Fork the repository 147 | 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 148 | 3. Commit your changes (`git commit -m 'feat: add amazing feature'`) 149 | 4. Push to the branch (`git push origin feature/amazing-feature`) 150 | 5. Open a Pull Request 151 | 152 | ## Command Line Options 153 | 154 | ```bash 155 | python cli.py [-h] [-v] [--debug] [-o OUTPUT] input_file 156 | ``` 157 | 158 | Arguments: 159 | 160 | - `input_file`: Path to the chat history file to parse 161 | - `-h, --help`: Show this help message and exit 162 | - `-v, --verbose`: Enable verbose output for debugging 163 | - `--debug`: Enable debug mode with additional logging 164 | - `-o, --output`: Specify output file path (default: input_file.spr.txt) 165 | 166 | ## License 167 | 168 | This project is licensed under the MIT License - see the LICENSE file for details. 169 | 170 | ## Support 171 | 172 | Pshaw... 173 | 174 | ## Acknowledgments 175 | 176 | - Windsurf IDE team 177 | - Codeium engineering team 178 | - Open source contributors 179 | 180 | -------------------------------------------------------------------------------- /cli.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import re 4 | import datetime 5 | import logging 6 | import argparse 7 | import json 8 | from dataclasses import dataclass, field 9 | from typing import List, Dict, Set, Optional, Union, Any 10 | from pathlib import Path 11 | from enum import Enum 12 | import ast 13 | from collections import defaultdict 14 | import copy 15 | 16 | # Configure logging 17 | logging.basicConfig( 18 | level=logging.INFO, 19 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' 20 | ) 21 | logger = logging.getLogger(__name__) 22 | 23 | class ActionType(Enum): 24 | """Types of actions that can occur in the chat.""" 25 | MOD = "MOD" # Modification 26 | DISC = "DISC" # Discussion 27 | DOC = "DOC" # Documentation 28 | VERIFY = "VERIFY" # Verification 29 | FIX = "FIX" # Bug fix 30 | REFACTOR = "REFACTOR" # Code refactor 31 | TEST = "TEST" # Testing 32 | CONFIG = "CONFIG" # Configuration 33 | 34 | class ComponentType(Enum): 35 | """Types of system components that can be referenced.""" 36 | API = "api" 37 | DATA = "data" 38 | DOCS = "docs" 39 | SCHEMA = "schema" 40 | BUILD = "build" 41 | ROUTE = "route" 42 | MODEL = "model" 43 | SERVICE = "service" 44 | CONFIG = "config" 45 | 46 | @dataclass 47 | class FileReference: 48 | """Reference to a file and its associated metadata.""" 49 | path: str 50 | component: ComponentType 51 | changes: List[str] = field(default_factory=list) 52 | impacts: List[str] = field(default_factory=list) 53 | 54 | @dataclass 55 | class ErrorTrace: 56 | """Details of an error occurrence.""" 57 | type: str 58 | message: str 59 | file: Optional[str] = None 60 | line: Optional[int] = None 61 | stack: List[str] = field(default_factory=list) 62 | 63 | @dataclass 64 | class HistoryEntry: 65 | """A single entry in the chat history.""" 66 | timestamp: str 67 | action_type: ActionType 68 | target: str 69 | files: List[FileReference] 70 | impacts: List[str] 71 | actions: List[str] 72 | error_traces: List[ErrorTrace] = field(default_factory=list) 73 | discussion_points: List[str] = field(default_factory=list) 74 | 75 | @dataclass 76 | class Context: 77 | """Context information for the current state.""" 78 | focus: str 79 | dependencies: List[str] 80 | schema_version: str 81 | data_version: str 82 | issues: List[str] 83 | tools: List[str] 84 | configs: Dict[str, str] 85 | artifacts: List[str] 86 | 87 | @dataclass 88 | class CurrentState: 89 | """Current state of the system/conversation.""" 90 | task: str 91 | state: str 92 | modifications: List[FileReference] 93 | next_steps: List[str] 94 | blockers: List[str] 95 | 96 | @dataclass 97 | class ContextBlock: 98 | """AI-optimized context block for session understanding.""" 99 | timestamp: str 100 | session_type: str 101 | main_issue: str 102 | core_problems: List[str] 103 | solution_approaches: List[str] 104 | key_files: Dict[str, List[str]] # file -> [changes] 105 | dependencies: Dict[str, List[str]] # component -> [dependencies] 106 | reasoning_chain: List[Dict[str, Any]] # List of {action, reason, result} 107 | state_changes: Dict[str, Any] # Track important state changes 108 | error_context: Dict[str, List[str]] # error_type -> [relevant_context] 109 | 110 | class CodeBlockParser: 111 | """Parses code blocks and their context from chat messages.""" 112 | 113 | def __init__(self): 114 | self.language_patterns = { 115 | 'python': r'```python\n(.*?)\n```', 116 | 'json': r'```json\n(.*?)\n```', 117 | 'javascript': r'```javascript\n(.*?)\n```', 118 | 'typescript': r'```typescript\n(.*?)\n```', 119 | 'bash': r'```bash\n(.*?)\n```', 120 | 'sql': r'```sql\n(.*?)\n```', 121 | # Support for inline code 122 | 'inline': r'`([^`]+)`', 123 | # Support for code without language specification 124 | 'generic': r'```\n(.*?)\n```' 125 | } 126 | 127 | def extract_code_blocks(self, text: str) -> List[Dict[str, str]]: 128 | """ 129 | Extract all code blocks from text. 130 | 131 | Args: 132 | text: The text to parse 133 | 134 | Returns: 135 | List of dicts containing language, code, and position info 136 | """ 137 | blocks = [] 138 | for lang, pattern in self.language_patterns.items(): 139 | matches = re.finditer(pattern, text, re.DOTALL) 140 | for match in matches: 141 | blocks.append({ 142 | 'language': lang, 143 | 'code': match.group(1).strip(), 144 | 'start': match.start(), 145 | 'end': match.end(), 146 | 'full_match': match.group(0) 147 | }) 148 | return sorted(blocks, key=lambda x: x['start']) 149 | 150 | class ErrorParser: 151 | """Parses error traces and exception details from chat messages.""" 152 | 153 | def __init__(self): 154 | self.error_patterns = { 155 | 'import': r'ImportError: (.*?)(?:\n|$)', 156 | 'type': r'TypeError: (.*?)(?:\n|$)', 157 | 'value': r'ValueError: (.*?)(?:\n|$)', 158 | 'attribute': r'AttributeError: (.*?)(?:\n|$)', 159 | 'key': r'KeyError: (.*?)(?:\n|$)', 160 | 'index': r'IndexError: (.*?)(?:\n|$)', 161 | 'name': r'NameError: (.*?)(?:\n|$)', 162 | 'syntax': r'SyntaxError: (.*?)(?:\n|$)', 163 | 'runtime': r'RuntimeError: (.*?)(?:\n|$)', 164 | 'assertion': r'AssertionError: (.*?)(?:\n|$)', 165 | 'indentation': r'IndentationError: (.*?)(?:\n|$)', 166 | 'os': r'OSError: (.*?)(?:\n|$)', 167 | 'io': r'IOError: (.*?)(?:\n|$)', 168 | 'permission': r'PermissionError: (.*?)(?:\n|$)', 169 | 'file_not_found': r'FileNotFoundError: (.*?)(?:\n|$)', 170 | 'module_not_found': r'ModuleNotFoundError: (.*?)(?:\n|$)', 171 | } 172 | self.file_line_pattern = r'File "([^"]+)", line (\d+)' 173 | self.traceback_pattern = r'Traceback \(most recent call last\):\n(.*?)(?:\n\n|$)' 174 | 175 | def parse_error(self, text: str) -> Optional[ErrorTrace]: 176 | """ 177 | Parse error information from text. 178 | 179 | Args: 180 | text: The text to parse 181 | 182 | Returns: 183 | ErrorTrace object if an error is found, None otherwise 184 | """ 185 | try: 186 | for error_type, pattern in self.error_patterns.items(): 187 | match = re.search(pattern, text, re.DOTALL) 188 | if match: 189 | error = ErrorTrace( 190 | type=error_type, 191 | message=match.group(1).strip() 192 | ) 193 | 194 | # Extract file and line info 195 | file_match = re.search(self.file_line_pattern, text) 196 | if file_match: 197 | error.file = file_match.group(1) 198 | error.line = int(file_match.group(2)) 199 | 200 | # Extract stack trace 201 | trace_match = re.search(self.traceback_pattern, text, re.DOTALL) 202 | if trace_match: 203 | error.stack = [ 204 | line.strip() 205 | for line in trace_match.group(1).split('\n') 206 | if line.strip() 207 | ] 208 | 209 | return error 210 | 211 | except Exception as e: 212 | logger.warning(f"Error parsing error trace: {str(e)}") 213 | 214 | return None 215 | 216 | class ChatLogParser: 217 | """Parses chat logs into structured SPR format optimized for AI consumption.""" 218 | 219 | def __init__(self, file_path: str): 220 | self.file_path = Path(file_path) 221 | self.context_blocks: List[ContextBlock] = [] 222 | self.current_block = None 223 | self.code_parser = CodeBlockParser() 224 | self.error_parser = ErrorParser() 225 | 226 | # Enhanced patterns for AI context extraction 227 | self.issue_patterns = [ 228 | r'(?:error|issue|problem|bug).*?:\s*(.*?)(?:\n|$)', 229 | r'(?:fails?|breaks?|doesn\'t work).*?(?:because|when|if)\s*(.*?)(?:\n|$)', 230 | r'(?:need|should|must)\s+(?:to\s+)?(?:fix|resolve|address)\s*(.*?)(?:\n|$)' 231 | ] 232 | self.solution_patterns = [ 233 | r'(?:fix|solve|resolve)\s+(?:this|the|that)\s+by\s*(.*?)(?:\n|$)', 234 | r'(?:let|going|need)\s+(?:me|to)\s*(?:try|implement|add|update)\s*(.*?)(?:\n|$)', 235 | r'(?:solution|approach|fix)\s+(?:is|would be)\s+to\s*(.*?)(?:\n|$)' 236 | ] 237 | self.reasoning_patterns = [ 238 | r'(?:because|since|as)\s*(.*?)(?:\n|$)', 239 | r'(?:this|that|it)\s+(?:means|implies|suggests)\s*(.*?)(?:\n|$)', 240 | r'(?:the|this|that)\s+(?:leads to|results in|causes)\s*(.*?)(?:\n|$)' 241 | ] 242 | 243 | def parse_log(self): 244 | """Parse the chat log into AI-optimized context blocks.""" 245 | with open(self.file_path, 'r') as f: 246 | content = f.read() 247 | 248 | # Split into logical blocks based on context shifts 249 | blocks = self._split_into_context_blocks(content) 250 | 251 | for block in blocks: 252 | context = ContextBlock( 253 | timestamp=self._extract_timestamp(block), 254 | session_type=self._infer_session_type(block), 255 | main_issue=self._extract_main_issue(block), 256 | core_problems=self._extract_core_problems(block), 257 | solution_approaches=self._extract_solutions(block), 258 | key_files=self._extract_file_changes(block), 259 | dependencies=self._extract_dependencies(block), 260 | reasoning_chain=self._extract_reasoning_chain(block), 261 | state_changes=self._extract_state_changes(block), 262 | error_context=self._extract_error_context(block) 263 | ) 264 | self.context_blocks.append(context) 265 | 266 | def generate_spr(self) -> str: 267 | """Generate AI-optimized SPR format.""" 268 | output = [] 269 | 270 | for block in self.context_blocks: 271 | # Metadata section 272 | output.append(f"#T:{block.timestamp}") 273 | output.append(f"#S:{block.session_type}") 274 | output.append(f"#I:{block.main_issue}") 275 | 276 | # Consolidate problems and link solutions 277 | consolidated_problems = self._consolidate_problems(block.core_problems) 278 | linked_problems = self._link_solutions_to_problems( 279 | consolidated_problems, 280 | block.solution_approaches 281 | ) 282 | 283 | # Core problems and solutions with reasoning 284 | output.append("@CONTEXT{") 285 | output.append(json.dumps({ 286 | 'issues': linked_problems, 287 | 'reasoning': block.reasoning_chain 288 | }, indent=2)) 289 | output.append("}") 290 | 291 | # File changes with semantic meaning 292 | if block.key_files: 293 | output.append("@CHANGES{") 294 | output.append(json.dumps(block.key_files, indent=2)) 295 | output.append("}") 296 | 297 | # Dependencies and state changes 298 | if block.dependencies or block.state_changes: 299 | output.append("@STATE{") 300 | state_info = {} 301 | if block.dependencies: 302 | state_info['dependencies'] = block.dependencies 303 | if block.state_changes: 304 | state_info['changes'] = block.state_changes 305 | output.append(json.dumps(state_info, indent=2)) 306 | output.append("}") 307 | 308 | # Error context if present 309 | if block.error_context: 310 | output.append("@ERRORS{") 311 | # Organize discussion points 312 | organized = self._organize_discussion( 313 | block.error_context.get('discussion', []) 314 | ) 315 | output.append(json.dumps({ 316 | **block.error_context, 317 | 'discussion': organized 318 | }, indent=2)) 319 | output.append("}") 320 | 321 | output.append("") # Block separator 322 | 323 | return "\n".join(output) 324 | 325 | def _split_into_context_blocks(self, content: str) -> List[str]: 326 | """Split content into logical blocks based on context shifts.""" 327 | blocks = [] 328 | current_block = [] 329 | 330 | # Split on Human/Assistant markers and major section headers 331 | lines = content.split('\n') 332 | for line in lines: 333 | # New context indicators 334 | if (line.startswith(('Human:', 'Assistant:', '## ', '# ')) or 335 | re.match(r'^[A-Z][a-z]+ \d{1,2}, \d{4}', line)): 336 | if current_block: 337 | blocks.append('\n'.join(current_block)) 338 | current_block = [] 339 | current_block.append(line) 340 | 341 | # Add final block 342 | if current_block: 343 | blocks.append('\n'.join(current_block)) 344 | 345 | # Merge small blocks that are likely part of the same context 346 | merged_blocks = [] 347 | temp_block = [] 348 | for block in blocks: 349 | if len(block.split('\n')) < 5 and temp_block: # Small block 350 | temp_block.append(block) 351 | else: 352 | if temp_block: 353 | merged_blocks.append('\n'.join(temp_block)) 354 | temp_block = [] 355 | temp_block.append(block) 356 | 357 | if temp_block: 358 | merged_blocks.append('\n'.join(temp_block)) 359 | 360 | return merged_blocks 361 | 362 | def _extract_timestamp(self, block: str) -> str: 363 | """Extract timestamp from block.""" 364 | # Look for ISO format timestamps 365 | iso_pattern = r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:[+-]\d{2}:?\d{2})?' 366 | if match := re.search(iso_pattern, block): 367 | return match.group(0) 368 | 369 | # Look for common date/time formats 370 | patterns = [ 371 | r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', 372 | r'\w+ \d{1,2}, \d{4} \d{1,2}:\d{2} (?:AM|PM)', 373 | ] 374 | 375 | for pattern in patterns: 376 | if match := re.search(pattern, block): 377 | # Convert to ISO format 378 | dt = datetime.datetime.strptime(match.group(0), "%Y-%m-%d %H:%M:%S") 379 | return dt.isoformat() 380 | 381 | # Default to current time if no timestamp found 382 | return datetime.datetime.now().isoformat() 383 | 384 | def _infer_session_type(self, block: str) -> str: 385 | """Infer the type of development session.""" 386 | # Look for explicit indicators 387 | type_indicators = { 388 | 'DEBUG': ['error', 'bug', 'fix', 'issue', 'problem', 'traceback'], 389 | 'FEATURE': ['implement', 'add', 'create', 'new feature'], 390 | 'REFACTOR': ['refactor', 'improve', 'optimize', 'clean', 'restructure'], 391 | 'TEST': ['test', 'verify', 'validate', 'check'], 392 | 'DOCS': ['document', 'explain', 'clarify', 'readme'], 393 | 'CONFIG': ['configure', 'setup', 'install', 'environment'], 394 | } 395 | 396 | block_lower = block.lower() 397 | for session_type, indicators in type_indicators.items(): 398 | if any(ind in block_lower for ind in indicators): 399 | return session_type 400 | 401 | # Look for code modifications 402 | if '```' in block or 'def ' in block or 'class ' in block: 403 | return 'CODE_MOD' 404 | 405 | return 'GENERAL' 406 | 407 | def _extract_main_issue(self, block: str) -> str: 408 | """Extract the main issue being discussed.""" 409 | # First look for explicit issue statements 410 | for pattern in self.issue_patterns: 411 | if match := re.search(pattern, block, re.IGNORECASE): 412 | return match.group(1).strip() 413 | 414 | # Look for error traces 415 | if 'Traceback' in block or 'Error:' in block: 416 | if error_match := re.search(r'(?:Traceback.*?|Error:)(.*?)(?=\n\w|$)', block, re.DOTALL): 417 | return error_match.group(1).strip() 418 | 419 | # Look for task/goal statements 420 | task_patterns = [ 421 | r'(?:need|want|trying) to\s+(.*?)(?:\.|$)', 422 | r'(?:goal|task) is to\s+(.*?)(?:\.|$)', 423 | r'(?:working on|implementing)\s+(.*?)(?:\.|$)' 424 | ] 425 | 426 | for pattern in task_patterns: 427 | if match := re.search(pattern, block, re.IGNORECASE): 428 | return match.group(1).strip() 429 | 430 | return "" 431 | 432 | def _extract_core_problems(self, block: str) -> List[str]: 433 | """Extract core problems identified in the discussion.""" 434 | problems = [] 435 | 436 | # Look for explicit problem statements 437 | for pattern in self.issue_patterns: 438 | matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE) 439 | for match in matches: 440 | problem = match.group(1).strip() 441 | if len(problem) > 10: # Filter out too short matches 442 | problems.append(problem) 443 | 444 | # Look for error traces 445 | error_pattern = r'(?:Traceback.*?(?=\n\w|$)|Error:.*?(?=\n\w|$))' 446 | if matches := re.finditer(error_pattern, block, re.DOTALL): 447 | for match in matches: 448 | error = match.group(0).split('\n')[0] # Get first line of traceback 449 | problems.append(error) 450 | 451 | # Look for "needs to" statements 452 | needs_pattern = r'needs? to (?:be )?(.*?)(?:\.|$)' 453 | if matches := re.finditer(needs_pattern, block, re.IGNORECASE): 454 | for match in matches: 455 | problems.append(f"Needs {match.group(1)}") 456 | 457 | return list(set(problems)) # Remove duplicates 458 | 459 | def _extract_solutions(self, block: str) -> List[str]: 460 | """Extract proposed solutions and approaches.""" 461 | solutions = [] 462 | 463 | # Look for explicit solution patterns 464 | for pattern in self.solution_patterns: 465 | matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE) 466 | for match in matches: 467 | solution = match.group(1).strip() 468 | if len(solution) > 10: # Filter out too short matches 469 | solutions.append(solution) 470 | 471 | # Look for code blocks that implement solutions 472 | code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL) 473 | for block_match in code_blocks: 474 | code = block_match.group(1) 475 | # Extract function/class definitions as solutions 476 | if def_match := re.search(r'def (\w+)', code): 477 | solutions.append(f"Implement {def_match.group(1)} function") 478 | if class_match := re.search(r'class (\w+)', code): 479 | solutions.append(f"Create {class_match.group(1)} class") 480 | 481 | return list(set(solutions)) # Remove duplicates 482 | 483 | def _extract_file_changes(self, block: str) -> Dict[str, List[str]]: 484 | """Extract file changes with semantic meaning.""" 485 | changes = {} 486 | 487 | # Look for file paths and associated changes 488 | file_patterns = [ 489 | r'(?:in|update|modify|create|edit)\s+`?([/\w.-]+\.[/\w.-]+)`?', 490 | r'(?:file|path):\s*`?([/\w.-]+\.[/\w.-]+)`?', 491 | r'(?:^|\s)`?([/\w.-]+\.[/\w.-]+)`?:\s*\w+', 492 | ] 493 | 494 | for pattern in file_patterns: 495 | matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE) 496 | for match in matches: 497 | file_path = match.group(1) 498 | if file_path not in changes: 499 | changes[file_path] = [] 500 | 501 | # Look for associated changes in surrounding context 502 | context = block[max(0, match.start() - 100):min(len(block), match.end() + 100)] 503 | 504 | # Extract change types 505 | change_types = [] 506 | if 'add' in context.lower() or 'create' in context.lower(): 507 | change_types.append('added') 508 | if 'update' in context.lower() or 'modify' in context.lower(): 509 | change_types.append('modified') 510 | if 'remove' in context.lower() or 'delete' in context.lower(): 511 | change_types.append('removed') 512 | if 'fix' in context.lower(): 513 | change_types.append('fixed') 514 | 515 | changes[file_path].extend(change_types) 516 | 517 | return changes 518 | 519 | def _extract_dependencies(self, block: str) -> Dict[str, List[str]]: 520 | """Extract component dependencies.""" 521 | deps = {} 522 | 523 | # Look for import statements in code blocks 524 | code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL) 525 | for block_match in code_blocks: 526 | code = block_match.group(1) 527 | # Extract imports 528 | imports = re.finditer(r'(?:from|import)\s+([\w.]+)(?:\s+import\s+)?', code) 529 | for imp in imports: 530 | module = imp.group(1) 531 | if '.' in module: 532 | parent = module.split('.')[0] 533 | if parent not in deps: 534 | deps[parent] = [] 535 | deps[parent].append(module) 536 | else: 537 | if 'external' not in deps: 538 | deps['external'] = [] 539 | deps['external'].append(module) 540 | 541 | # Look for dependency mentions in text 542 | dep_patterns = [ 543 | r'depends? on\s+`?([\w.]+)`?', 544 | r'requires?\s+`?([\w.]+)`?', 545 | r'using\s+`?([\w.]+)`?', 546 | ] 547 | 548 | for pattern in dep_patterns: 549 | matches = re.finditer(pattern, block, re.IGNORECASE) 550 | for match in matches: 551 | dep = match.group(1) 552 | if 'mentioned' not in deps: 553 | deps['mentioned'] = [] 554 | deps['mentioned'].append(dep) 555 | 556 | return deps 557 | 558 | def _extract_reasoning_chain(self, block: str) -> List[Dict[str, Any]]: 559 | """Extract the chain of reasoning in the discussion.""" 560 | chain = [] 561 | 562 | # Look for cause-effect relationships 563 | for pattern in self.reasoning_patterns: 564 | matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE) 565 | for match in matches: 566 | reasoning = match.group(1).strip() 567 | # Look for action and result in surrounding context 568 | context = block[max(0, match.start() - 100):min(len(block), match.end() + 100)] 569 | 570 | # Try to identify action and result 571 | action = "" 572 | if action_match := re.search(r'(?:will|should|must|going to)\s+(.*?)(?:\s+because|\s+since|$)', context): 573 | action = action_match.group(1) 574 | 575 | result = "" 576 | if result_match := re.search(r'(?:this will|resulting in|leads to)\s+(.*?)(?:\.|$)', context): 577 | result = result_match.group(1) 578 | 579 | if action or result: 580 | chain.append({ 581 | "action": action or "unspecified", 582 | "reason": reasoning, 583 | "result": result or "unspecified" 584 | }) 585 | 586 | return chain 587 | 588 | def _extract_state_changes(self, block: str) -> Dict[str, Any]: 589 | """Extract important state changes.""" 590 | changes = {} 591 | 592 | # Look for state change indicators 593 | state_patterns = { 594 | 'status': r'status(?:\s+is|\s+changed\s+to)?\s+`?([\w_]+)`?', 595 | 'phase': r'phase(?:\s+is|\s+moved\s+to)?\s+`?([\w_]+)`?', 596 | 'version': r'version(?:\s+is|\s+updated\s+to)?\s+`?([\w.]+)`?', 597 | 'config': r'config(?:\s+is|\s+set\s+to)?\s+`?([\w_]+)`?', 598 | } 599 | 600 | for key, pattern in state_patterns.items(): 601 | if match := re.search(pattern, block, re.IGNORECASE): 602 | changes[key] = match.group(1) 603 | 604 | # Look for variable assignments in code 605 | code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL) 606 | for block_match in code_blocks: 607 | code = block_match.group(1) 608 | assignments = re.finditer(r'(\w+)\s*=\s*([^;\n]+)', code) 609 | for assign in assignments: 610 | var, value = assign.groups() 611 | if var.isupper(): # Likely a constant/config 612 | changes[f"code_{var.lower()}"] = value.strip() 613 | 614 | return changes 615 | 616 | def _extract_error_context(self, block: str) -> Dict[str, List[str]]: 617 | """Extract error context and related information.""" 618 | context = {} 619 | 620 | # Look for error traces 621 | if 'Traceback' in block or 'Error:' in block: 622 | traces = re.finditer(r'(?:Traceback.*?(?=\n\w|$)|Error:.*?(?=\n\w|$))', block, re.DOTALL) 623 | for trace in traces: 624 | error_text = trace.group(0) 625 | 626 | # Extract error type 627 | if type_match := re.search(r'(\w+Error):', error_text): 628 | error_type = type_match.group(1) 629 | if error_type not in context: 630 | context[error_type] = [] 631 | 632 | # Extract relevant lines 633 | lines = error_text.split('\n') 634 | context[error_type].extend([ 635 | line.strip() for line in lines 636 | if line.strip() and not line.startswith(' ') 637 | ]) 638 | 639 | # Look for file references 640 | files = re.finditer(r'File "([^"]+)", line (\d+)', error_text) 641 | for file_match in files: 642 | context[error_type].append(f"In {file_match.group(1)}:{file_match.group(2)}") 643 | 644 | # Look for error-related discussion 645 | error_discussion = re.finditer(r'(?:error|issue|bug|problem).*?:\s*(.*?)(?:\n|$)', block, re.IGNORECASE) 646 | for disc in error_discussion: 647 | if 'discussion' not in context: 648 | context['discussion'] = [] 649 | context['discussion'].append(disc.group(1).strip()) 650 | 651 | return context 652 | 653 | def _consolidate_problems(self, problems: List[str]) -> List[Dict[str, Any]]: 654 | """Consolidate similar problems into higher-level issues with context.""" 655 | # First group by root cause 656 | root_causes = { 657 | 'import': { 658 | 'pattern': r'(?:import|from)\s+([^\s]+)', 659 | 'problems': [], 660 | 'affected_modules': set() 661 | }, 662 | 'initialization': { 663 | 'pattern': r'(?:init|create|setup)\s+([^\s]+)', 664 | 'problems': [], 665 | 'affected_components': set() 666 | }, 667 | 'validation': { 668 | 'pattern': r'(?:valid|schema|type)\s+([^\s]+)', 669 | 'problems': [], 670 | 'affected_fields': set() 671 | }, 672 | 'undefined': { 673 | 'pattern': r'name\s+\'([^\']+)\'\s+is not defined', 674 | 'problems': [], 675 | 'missing_names': set() 676 | }, 677 | 'attribute': { 678 | 'pattern': r'has no attribute\s+\'([^\']+)\'', 679 | 'problems': [], 680 | 'missing_attrs': set() 681 | }, 682 | 'other': { 683 | 'pattern': None, 684 | 'problems': [], 685 | 'context': set() 686 | } 687 | } 688 | 689 | # Categorize each problem 690 | for problem in problems: 691 | problem = problem.strip() 692 | if not problem: 693 | continue 694 | 695 | matched = False 696 | for category, info in root_causes.items(): 697 | if category == 'other': 698 | continue 699 | 700 | if matches := re.finditer(info['pattern'], problem, re.IGNORECASE): 701 | for match in matches: 702 | matched = True 703 | info['problems'].append(problem) 704 | if category == 'import': 705 | info['affected_modules'].add(match.group(1)) 706 | elif category == 'initialization': 707 | info['affected_components'].add(match.group(1)) 708 | elif category == 'validation': 709 | info['affected_fields'].add(match.group(1)) 710 | elif category == 'undefined': 711 | info['missing_names'].add(match.group(1)) 712 | elif category == 'attribute': 713 | info['missing_attrs'].add(match.group(1)) 714 | 715 | if not matched: 716 | root_causes['other']['problems'].append(problem) 717 | 718 | # Convert to final format 719 | consolidated = [] 720 | for category, info in root_causes.items(): 721 | if not info['problems']: 722 | continue 723 | 724 | issue = { 725 | 'type': category, 726 | 'summary': self._generate_summary(category, info), 727 | 'details': list(set(info['problems'])), # Remove duplicates 728 | 'context': {} 729 | } 730 | 731 | # Add category-specific context 732 | if category == 'import': 733 | issue['context']['modules'] = list(info['affected_modules']) 734 | elif category == 'initialization': 735 | issue['context']['components'] = list(info['affected_components']) 736 | elif category == 'validation': 737 | issue['context']['fields'] = list(info['affected_fields']) 738 | elif category == 'undefined': 739 | issue['context']['names'] = list(info['missing_names']) 740 | elif category == 'attribute': 741 | issue['context']['attributes'] = list(info['missing_attrs']) 742 | elif category == 'other': 743 | issue['context']['general'] = list(info['context']) 744 | 745 | consolidated.append(issue) 746 | 747 | return consolidated 748 | 749 | def _generate_summary(self, category: str, info: Dict[str, Any]) -> str: 750 | """Generate a concise summary of the issue category.""" 751 | if category == 'import': 752 | modules = ', '.join(info['affected_modules']) 753 | return f"Import issues with modules: {modules}" 754 | elif category == 'initialization': 755 | components = ', '.join(info['affected_components']) 756 | return f"Initialization issues in components: {components}" 757 | elif category == 'validation': 758 | fields = ', '.join(info['affected_fields']) 759 | return f"Validation issues with fields: {fields}" 760 | elif category == 'undefined': 761 | names = ', '.join(info['missing_names']) 762 | return f"Undefined names: {names}" 763 | elif category == 'attribute': 764 | attrs = ', '.join(info['missing_attrs']) 765 | return f"Missing attributes: {attrs}" 766 | else: 767 | return "Other issues found" 768 | 769 | def _link_solutions_to_problems(self, problems: List[Dict[str, Any]], solutions: List[str]) -> List[Dict[str, Any]]: 770 | """Link solutions to their corresponding problems.""" 771 | linked_problems = copy.deepcopy(problems) 772 | 773 | for problem in linked_problems: 774 | problem['solutions'] = [] 775 | problem_text = ' '.join([problem['summary']] + problem['details']).lower() 776 | 777 | for solution in solutions: 778 | # Look for solutions that mention the problem's context 779 | if any(word.lower() in solution.lower() for word in problem['context'].get('modules', [])): 780 | problem['solutions'].append(solution) 781 | elif any(word.lower() in solution.lower() for word in problem['context'].get('components', [])): 782 | problem['solutions'].append(solution) 783 | elif any(word.lower() in solution.lower() for word in problem['context'].get('fields', [])): 784 | problem['solutions'].append(solution) 785 | elif any(word.lower() in solution.lower() for word in problem['context'].get('names', [])): 786 | problem['solutions'].append(solution) 787 | elif any(word.lower() in solution.lower() for word in problem['context'].get('attributes', [])): 788 | problem['solutions'].append(solution) 789 | # Look for solutions that mention keywords from the problem 790 | elif any(word in solution.lower() for word in problem_text.split()): 791 | problem['solutions'].append(solution) 792 | 793 | return linked_problems 794 | 795 | def _organize_discussion(self, points: List[str]) -> Dict[str, List[str]]: 796 | """Organize discussion points into categories.""" 797 | organized = { 798 | 'analysis': [], 799 | 'changes': [], 800 | 'errors': [], 801 | 'solutions': [], 802 | 'verifications': [] 803 | } 804 | 805 | for point in points: 806 | point = point.strip() 807 | # Skip empty or uninformative points 808 | if not point or point in ['Analyzed', 'Edited', 'Edit:', 'CopyInsert']: 809 | continue 810 | 811 | # Categorize based on content 812 | if any(word in point.lower() for word in ['error', 'exception', 'fail']): 813 | organized['errors'].append(point) 814 | elif any(word in point.lower() for word in ['update', 'change', 'modify']): 815 | organized['changes'].append(point) 816 | elif any(word in point.lower() for word in ['fix', 'solve', 'resolve']): 817 | organized['solutions'].append(point) 818 | elif any(word in point.lower() for word in ['check', 'verify', 'test']): 819 | organized['verifications'].append(point) 820 | else: 821 | organized['analysis'].append(point) 822 | 823 | # Remove empty categories 824 | return {k: v for k, v in organized.items() if v} 825 | 826 | def main(): 827 | """Main entry point for the script.""" 828 | parser = argparse.ArgumentParser( 829 | description='Convert chat logs to SPR (Sparse Priming Representation) format', 830 | formatter_class=argparse.RawDescriptionHelpFormatter, 831 | epilog=""" 832 | Example usage: 833 | %(prog)s chat_history.txt 834 | %(prog)s chat_history.txt -o custom_output.spr 835 | %(prog)s chat_history.txt -v 836 | """ 837 | ) 838 | parser.add_argument( 839 | 'input_file', 840 | type=str, 841 | help='Path to the input chat log file' 842 | ) 843 | parser.add_argument( 844 | '-o', '--output', 845 | type=str, 846 | default=None, 847 | help='Path to the output SPR file (default: input_file_spr.txt)' 848 | ) 849 | parser.add_argument( 850 | '-v', '--verbose', 851 | action='store_true', 852 | help='Enable verbose logging' 853 | ) 854 | parser.add_argument( 855 | '--debug', 856 | action='store_true', 857 | help='Enable debug mode with additional output' 858 | ) 859 | 860 | args = parser.parse_args() 861 | 862 | # Configure logging based on arguments 863 | if args.debug: 864 | logging.getLogger().setLevel(logging.DEBUG) 865 | elif args.verbose: 866 | logging.getLogger().setLevel(logging.INFO) 867 | else: 868 | logging.getLogger().setLevel(logging.WARNING) 869 | 870 | try: 871 | input_path = Path(args.input_file) 872 | if not input_path.exists(): 873 | logger.error(f"Input file not found: {input_path}") 874 | return 1 875 | 876 | # Determine output path 877 | output_path = args.output 878 | if output_path is None: 879 | output_path = input_path.with_suffix('.spr.txt') 880 | output_path = Path(output_path) 881 | 882 | # Create output directory if it doesn't exist 883 | output_path.parent.mkdir(parents=True, exist_ok=True) 884 | 885 | logger.info(f"Processing {input_path}") 886 | logger.info(f"Output will be written to {output_path}") 887 | 888 | # Parse and generate SPR 889 | chat_parser = ChatLogParser(str(input_path)) 890 | chat_parser.parse_log() 891 | spr = chat_parser.generate_spr() 892 | 893 | # Write output 894 | output_path.write_text(spr) 895 | logger.info(f"Successfully generated SPR format at {output_path}") 896 | 897 | if args.debug: 898 | # Print some statistics in debug mode 899 | logger.debug(f"Found {len(chat_parser.context_blocks)} context blocks") 900 | 901 | return 0 902 | 903 | except Exception as e: 904 | logger.error(f"Error processing chat log: {str(e)}") 905 | if args.debug: 906 | logger.exception("Detailed error trace:") 907 | return 1 908 | 909 | if __name__ == "__main__": 910 | exit(main()) 911 | --------------------------------------------------------------------------------