├── .gitignore
├── README.md
└── cli.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # macOS system files
 2 | .DS_Store
 3 | .AppleDouble
 4 | .LSOverride
 5 | Icon
 6 | ._*
 7 | .DocumentRevisions-V100
 8 | .fseventsd
 9 | .Spotlight-V100
10 | .TemporaryItems
11 | .Trashes
12 | .VolumeIcon.icns
13 | .com.apple.timemachine.donotpresent
14 | .AppleDB
15 | .AppleDesktop
16 | Network Trash Folder
17 | Temporary Items
18 | .apdisk
19 | 
20 | # Python
21 | __pycache__/
22 | *.py[cod]
23 | *$py.class
24 | *.so
25 | .Python
26 | build/
27 | develop-eggs/
28 | dist/
29 | downloads/
30 | eggs/
31 | .eggs/
32 | lib/
33 | lib64/
34 | parts/
35 | sdist/
36 | var/
37 | wheels/
38 | *.egg-info/
39 | .installed.cfg
40 | *.egg
41 | MANIFEST
42 | .env
43 | .venv
44 | env/
45 | venv/
46 | ENV/
47 | env.bak/
48 | venv.bak/
49 | .python-version
50 | .pytest_cache/
51 | .coverage
52 | htmlcov/
53 | 
54 | # IDEs and editors
55 | .idea/
56 | .vscode/
57 | *.swp
58 | *.swo
59 | *~
60 | .project
61 | .classpath
62 | .settings/
63 | *.sublime-workspace
64 | *.sublime-project
65 | 
66 | # Project-specific files
67 | *.txt
68 | *.log
69 | *.spr
70 | *.spr.txt
71 | 
72 | # Keep specific text files
73 | !requirements.txt
74 | !.env.example
75 | !LICENSE.txt
76 | !README.txt
77 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # codeiumSPR
  2 | 
  3 | Because AI Assistants Shouldn't Have Goldfish Memory
  4 | 
  5 | A specialized tool for optimizing chat history context in Windsurf IDE.
  6 | 
  7 | This tool processes chat logs into a Semantic Parsing Record
  8 | (SPR) format, enabling better context retention and understanding across coding
  9 | sessions.
 10 | 
 11 | `<not_really_an_ad>`
 12 | 
 13 | ## Hey Kids!
 14 | 
 15 | Ever had your AI coding assistant forget what you were working on faster than 
 16 | you forget your keys? Say hello to codeiumSPR :)
 17 | 
 18 | Using my super-duper-not-a-pooper-scooper Semantic Parsing Record (SPR) script, 
 19 | you can:
 20 | 
 21 | - Bring the agent up to speed quicker than your cat knocking stuff off tables 
 22 |   (unlike your ex)
 23 | - Keep context around longer than your New Year's resolutions
 24 | - Parse chat histories smoother than your pickup lines
 25 | 
 26 | But wait, there's more! Act now and you'll get:
 27 | 
 28 | - A new session that doesn't ask "wait, what were we doing again?"
 29 | - A context window larger than your grandma's bumbum
 30 | - Error tracking better than your excuses for missed deadlines
 31 | 
 32 | P.S. Side effects may include: actually finishing your projects, fewer face-palm
 33 | moments, a suspicious amount of productive coding sessions, explosive diarrhea,
 34 | nausea, headaches, spontaneous combustion, and an inexplicable urge to
 35 | high-five your rubber duck.
 36 | 
 37 | P.P.S. No goldfish were harmed in the making of this neat little script. They
 38 | just helped with the memory testing. 🐠
 39 | 
 40 | 
 41 | ## Features
 42 | 
 43 | - Semantic parsing of chat histories
 44 | - Problem and solution linking
 45 | - State change tracking
 46 | - Dependency analysis
 47 | - Error context preservation
 48 | 
 49 | ## Installation
 50 | 
 51 | ```bash
 52 | # Clone the repository
 53 | git clone https://github.com/claytondukes/codeiumSPR.git
 54 | 
 55 | # Navigate to the directory
 56 | cd codeiumSPR
 57 | ```
 58 | 
 59 | ## Usage
 60 | 
 61 | 1. copy/paste your entire chat history to a text file
 62 | 2. Run the parser:
 63 | 
 64 | ```bash
 65 | python cli.py chat_history.txt
 66 | ```
 67 | 
 68 | The tool will generate a `chat_history.spr.txt` file containing the optimized
 69 | context.
 70 | 
 71 | On your next session, tell it to read the file(s) and you're good to go!
 72 | 
 73 | Example:
 74 | 
 75 | ```bash
 76 | check the following for our session histories. in order of oldest to newest:
 77 | @history.spr.txt @history2.spr.txt @history3.spr.txt @history4.spr.txt @history5.spr.txt 
 78 | Then read all @docs 
 79 | ```
 80 | 
 81 | ## SPR Format
 82 | 
 83 | The Semantic Parsing Record (SPR) uses the following structure:
 84 | 
 85 | ### Metadata Markers
 86 | 
 87 | - `#T`: Timestamp (ISO 8601 format)
 88 | - `#S`: Session type (DEBUG, FEATURE, REFACTOR, etc.)
 89 | - `#I`: Main issue or task
 90 | 
 91 | ### Context Blocks
 92 | 
 93 | ```text
 94 | @CONTEXT{
 95 |   "issues": [
 96 |     {
 97 |       "type": "category",
 98 |       "summary": "Concise problem description",
 99 |       "details": ["Detailed information"],
100 |       "context": {"relevant": "metadata"}
101 |     }
102 |   ],
103 |   "reasoning": ["Chain of thought"]
104 | }
105 | ```
106 | 
107 | ### Change Tracking
108 | 
109 | ```text
110 | @CHANGES{
111 |   "file_path": ["modifications"]
112 | }
113 | ```
114 | 
115 | ### State Management
116 | 
117 | ```text
118 | @STATE{
119 |   "dependencies": ["required components"],
120 |   "changes": ["state modifications"]
121 | }
122 | ```
123 | 
124 | ### Error Context
125 | 
126 | ```text
127 | @ERRORS{
128 |   "type": "error_category",
129 |   "discussion": {
130 |     "analysis": ["investigation steps"],
131 |     "solutions": ["proposed fixes"]
132 |   }
133 | }
134 | ```
135 | 
136 | ## Development
137 | 
138 | ### Prerequisites
139 | 
140 | - Python 3.8+
141 | - pip package manager
142 | - Git
143 | 
144 | ## Contributing
145 | 
146 | 1. Fork the repository
147 | 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
148 | 3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
149 | 4. Push to the branch (`git push origin feature/amazing-feature`)
150 | 5. Open a Pull Request
151 | 
152 | ## Command Line Options
153 | 
154 | ```bash
155 | python cli.py [-h] [-v] [--debug] [-o OUTPUT] input_file
156 | ```
157 | 
158 | Arguments:
159 | 
160 | - `input_file`: Path to the chat history file to parse
161 | - `-h, --help`: Show this help message and exit
162 | - `-v, --verbose`: Enable verbose output for debugging
163 | - `--debug`: Enable debug mode with additional logging
164 | - `-o, --output`: Specify output file path (default: input_file.spr.txt)
165 | 
166 | ## License
167 | 
168 | This project is licensed under the MIT License - see the LICENSE file for details.
169 | 
170 | ## Support
171 | 
172 | Pshaw...
173 | 
174 | ## Acknowledgments
175 | 
176 | - Windsurf IDE team
177 | - Codeium engineering team
178 | - Open source contributors
179 |   
180 | 


--------------------------------------------------------------------------------
/cli.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import re
  4 | import datetime
  5 | import logging
  6 | import argparse
  7 | import json
  8 | from dataclasses import dataclass, field
  9 | from typing import List, Dict, Set, Optional, Union, Any
 10 | from pathlib import Path
 11 | from enum import Enum
 12 | import ast
 13 | from collections import defaultdict
 14 | import copy
 15 | 
 16 | # Configure logging
 17 | logging.basicConfig(
 18 |     level=logging.INFO,
 19 |     format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
 20 | )
 21 | logger = logging.getLogger(__name__)
 22 | 
 23 | class ActionType(Enum):
 24 |     """Types of actions that can occur in the chat."""
 25 |     MOD = "MOD"      # Modification
 26 |     DISC = "DISC"    # Discussion
 27 |     DOC = "DOC"      # Documentation
 28 |     VERIFY = "VERIFY" # Verification
 29 |     FIX = "FIX"      # Bug fix
 30 |     REFACTOR = "REFACTOR" # Code refactor
 31 |     TEST = "TEST"    # Testing
 32 |     CONFIG = "CONFIG" # Configuration
 33 | 
 34 | class ComponentType(Enum):
 35 |     """Types of system components that can be referenced."""
 36 |     API = "api"
 37 |     DATA = "data"
 38 |     DOCS = "docs"
 39 |     SCHEMA = "schema"
 40 |     BUILD = "build"
 41 |     ROUTE = "route"
 42 |     MODEL = "model"
 43 |     SERVICE = "service"
 44 |     CONFIG = "config"
 45 | 
 46 | @dataclass
 47 | class FileReference:
 48 |     """Reference to a file and its associated metadata."""
 49 |     path: str
 50 |     component: ComponentType
 51 |     changes: List[str] = field(default_factory=list)
 52 |     impacts: List[str] = field(default_factory=list)
 53 | 
 54 | @dataclass
 55 | class ErrorTrace:
 56 |     """Details of an error occurrence."""
 57 |     type: str
 58 |     message: str
 59 |     file: Optional[str] = None
 60 |     line: Optional[int] = None
 61 |     stack: List[str] = field(default_factory=list)
 62 | 
 63 | @dataclass
 64 | class HistoryEntry:
 65 |     """A single entry in the chat history."""
 66 |     timestamp: str
 67 |     action_type: ActionType
 68 |     target: str
 69 |     files: List[FileReference]
 70 |     impacts: List[str]
 71 |     actions: List[str]
 72 |     error_traces: List[ErrorTrace] = field(default_factory=list)
 73 |     discussion_points: List[str] = field(default_factory=list)
 74 | 
 75 | @dataclass
 76 | class Context:
 77 |     """Context information for the current state."""
 78 |     focus: str
 79 |     dependencies: List[str]
 80 |     schema_version: str
 81 |     data_version: str
 82 |     issues: List[str]
 83 |     tools: List[str]
 84 |     configs: Dict[str, str]
 85 |     artifacts: List[str]
 86 | 
 87 | @dataclass
 88 | class CurrentState:
 89 |     """Current state of the system/conversation."""
 90 |     task: str
 91 |     state: str
 92 |     modifications: List[FileReference]
 93 |     next_steps: List[str]
 94 |     blockers: List[str]
 95 | 
 96 | @dataclass
 97 | class ContextBlock:
 98 |     """AI-optimized context block for session understanding."""
 99 |     timestamp: str
100 |     session_type: str
101 |     main_issue: str
102 |     core_problems: List[str]
103 |     solution_approaches: List[str]
104 |     key_files: Dict[str, List[str]]  # file -> [changes]
105 |     dependencies: Dict[str, List[str]]  # component -> [dependencies]
106 |     reasoning_chain: List[Dict[str, Any]]  # List of {action, reason, result}
107 |     state_changes: Dict[str, Any]  # Track important state changes
108 |     error_context: Dict[str, List[str]]  # error_type -> [relevant_context]
109 | 
110 | class CodeBlockParser:
111 |     """Parses code blocks and their context from chat messages."""
112 |     
113 |     def __init__(self):
114 |         self.language_patterns = {
115 |             'python': r'```python\n(.*?)\n```',
116 |             'json': r'```json\n(.*?)\n```',
117 |             'javascript': r'```javascript\n(.*?)\n```',
118 |             'typescript': r'```typescript\n(.*?)\n```',
119 |             'bash': r'```bash\n(.*?)\n```',
120 |             'sql': r'```sql\n(.*?)\n```',
121 |             # Support for inline code
122 |             'inline': r'`([^`]+)`',
123 |             # Support for code without language specification
124 |             'generic': r'```\n(.*?)\n```'
125 |         }
126 | 
127 |     def extract_code_blocks(self, text: str) -> List[Dict[str, str]]:
128 |         """
129 |         Extract all code blocks from text.
130 |         
131 |         Args:
132 |             text: The text to parse
133 |             
134 |         Returns:
135 |             List of dicts containing language, code, and position info
136 |         """
137 |         blocks = []
138 |         for lang, pattern in self.language_patterns.items():
139 |             matches = re.finditer(pattern, text, re.DOTALL)
140 |             for match in matches:
141 |                 blocks.append({
142 |                     'language': lang,
143 |                     'code': match.group(1).strip(),
144 |                     'start': match.start(),
145 |                     'end': match.end(),
146 |                     'full_match': match.group(0)
147 |                 })
148 |         return sorted(blocks, key=lambda x: x['start'])
149 | 
150 | class ErrorParser:
151 |     """Parses error traces and exception details from chat messages."""
152 |     
153 |     def __init__(self):
154 |         self.error_patterns = {
155 |             'import': r'ImportError: (.*?)(?:\n|$)',
156 |             'type': r'TypeError: (.*?)(?:\n|$)',
157 |             'value': r'ValueError: (.*?)(?:\n|$)',
158 |             'attribute': r'AttributeError: (.*?)(?:\n|$)',
159 |             'key': r'KeyError: (.*?)(?:\n|$)',
160 |             'index': r'IndexError: (.*?)(?:\n|$)',
161 |             'name': r'NameError: (.*?)(?:\n|$)',
162 |             'syntax': r'SyntaxError: (.*?)(?:\n|$)',
163 |             'runtime': r'RuntimeError: (.*?)(?:\n|$)',
164 |             'assertion': r'AssertionError: (.*?)(?:\n|$)',
165 |             'indentation': r'IndentationError: (.*?)(?:\n|$)',
166 |             'os': r'OSError: (.*?)(?:\n|$)',
167 |             'io': r'IOError: (.*?)(?:\n|$)',
168 |             'permission': r'PermissionError: (.*?)(?:\n|$)',
169 |             'file_not_found': r'FileNotFoundError: (.*?)(?:\n|$)',
170 |             'module_not_found': r'ModuleNotFoundError: (.*?)(?:\n|$)',
171 |         }
172 |         self.file_line_pattern = r'File "([^"]+)", line (\d+)'
173 |         self.traceback_pattern = r'Traceback \(most recent call last\):\n(.*?)(?:\n\n|$)'
174 | 
175 |     def parse_error(self, text: str) -> Optional[ErrorTrace]:
176 |         """
177 |         Parse error information from text.
178 |         
179 |         Args:
180 |             text: The text to parse
181 |             
182 |         Returns:
183 |             ErrorTrace object if an error is found, None otherwise
184 |         """
185 |         try:
186 |             for error_type, pattern in self.error_patterns.items():
187 |                 match = re.search(pattern, text, re.DOTALL)
188 |                 if match:
189 |                     error = ErrorTrace(
190 |                         type=error_type,
191 |                         message=match.group(1).strip()
192 |                     )
193 |                     
194 |                     # Extract file and line info
195 |                     file_match = re.search(self.file_line_pattern, text)
196 |                     if file_match:
197 |                         error.file = file_match.group(1)
198 |                         error.line = int(file_match.group(2))
199 | 
200 |                     # Extract stack trace
201 |                     trace_match = re.search(self.traceback_pattern, text, re.DOTALL)
202 |                     if trace_match:
203 |                         error.stack = [
204 |                             line.strip() 
205 |                             for line in trace_match.group(1).split('\n')
206 |                             if line.strip()
207 |                         ]
208 | 
209 |                     return error
210 |                     
211 |         except Exception as e:
212 |             logger.warning(f"Error parsing error trace: {str(e)}")
213 |             
214 |         return None
215 | 
216 | class ChatLogParser:
217 |     """Parses chat logs into structured SPR format optimized for AI consumption."""
218 |     
219 |     def __init__(self, file_path: str):
220 |         self.file_path = Path(file_path)
221 |         self.context_blocks: List[ContextBlock] = []
222 |         self.current_block = None
223 |         self.code_parser = CodeBlockParser()
224 |         self.error_parser = ErrorParser()
225 |         
226 |         # Enhanced patterns for AI context extraction
227 |         self.issue_patterns = [
228 |             r'(?:error|issue|problem|bug).*?:\s*(.*?)(?:\n|$)',
229 |             r'(?:fails?|breaks?|doesn\'t work).*?(?:because|when|if)\s*(.*?)(?:\n|$)',
230 |             r'(?:need|should|must)\s+(?:to\s+)?(?:fix|resolve|address)\s*(.*?)(?:\n|$)'
231 |         ]
232 |         self.solution_patterns = [
233 |             r'(?:fix|solve|resolve)\s+(?:this|the|that)\s+by\s*(.*?)(?:\n|$)',
234 |             r'(?:let|going|need)\s+(?:me|to)\s*(?:try|implement|add|update)\s*(.*?)(?:\n|$)',
235 |             r'(?:solution|approach|fix)\s+(?:is|would be)\s+to\s*(.*?)(?:\n|$)'
236 |         ]
237 |         self.reasoning_patterns = [
238 |             r'(?:because|since|as)\s*(.*?)(?:\n|$)',
239 |             r'(?:this|that|it)\s+(?:means|implies|suggests)\s*(.*?)(?:\n|$)',
240 |             r'(?:the|this|that)\s+(?:leads to|results in|causes)\s*(.*?)(?:\n|$)'
241 |         ]
242 | 
243 |     def parse_log(self):
244 |         """Parse the chat log into AI-optimized context blocks."""
245 |         with open(self.file_path, 'r') as f:
246 |             content = f.read()
247 |             
248 |         # Split into logical blocks based on context shifts
249 |         blocks = self._split_into_context_blocks(content)
250 |         
251 |         for block in blocks:
252 |             context = ContextBlock(
253 |                 timestamp=self._extract_timestamp(block),
254 |                 session_type=self._infer_session_type(block),
255 |                 main_issue=self._extract_main_issue(block),
256 |                 core_problems=self._extract_core_problems(block),
257 |                 solution_approaches=self._extract_solutions(block),
258 |                 key_files=self._extract_file_changes(block),
259 |                 dependencies=self._extract_dependencies(block),
260 |                 reasoning_chain=self._extract_reasoning_chain(block),
261 |                 state_changes=self._extract_state_changes(block),
262 |                 error_context=self._extract_error_context(block)
263 |             )
264 |             self.context_blocks.append(context)
265 | 
266 |     def generate_spr(self) -> str:
267 |         """Generate AI-optimized SPR format."""
268 |         output = []
269 |         
270 |         for block in self.context_blocks:
271 |             # Metadata section
272 |             output.append(f"#T:{block.timestamp}")
273 |             output.append(f"#S:{block.session_type}")
274 |             output.append(f"#I:{block.main_issue}")
275 |             
276 |             # Consolidate problems and link solutions
277 |             consolidated_problems = self._consolidate_problems(block.core_problems)
278 |             linked_problems = self._link_solutions_to_problems(
279 |                 consolidated_problems, 
280 |                 block.solution_approaches
281 |             )
282 |             
283 |             # Core problems and solutions with reasoning
284 |             output.append("@CONTEXT{")
285 |             output.append(json.dumps({
286 |                 'issues': linked_problems,
287 |                 'reasoning': block.reasoning_chain
288 |             }, indent=2))
289 |             output.append("}")
290 |             
291 |             # File changes with semantic meaning
292 |             if block.key_files:
293 |                 output.append("@CHANGES{")
294 |                 output.append(json.dumps(block.key_files, indent=2))
295 |                 output.append("}")
296 |             
297 |             # Dependencies and state changes
298 |             if block.dependencies or block.state_changes:
299 |                 output.append("@STATE{")
300 |                 state_info = {}
301 |                 if block.dependencies:
302 |                     state_info['dependencies'] = block.dependencies
303 |                 if block.state_changes:
304 |                     state_info['changes'] = block.state_changes
305 |                 output.append(json.dumps(state_info, indent=2))
306 |                 output.append("}")
307 |             
308 |             # Error context if present
309 |             if block.error_context:
310 |                 output.append("@ERRORS{")
311 |                 # Organize discussion points
312 |                 organized = self._organize_discussion(
313 |                     block.error_context.get('discussion', [])
314 |                 )
315 |                 output.append(json.dumps({
316 |                     **block.error_context,
317 |                     'discussion': organized
318 |                 }, indent=2))
319 |                 output.append("}")
320 |             
321 |             output.append("")  # Block separator
322 |         
323 |         return "\n".join(output)
324 | 
325 |     def _split_into_context_blocks(self, content: str) -> List[str]:
326 |         """Split content into logical blocks based on context shifts."""
327 |         blocks = []
328 |         current_block = []
329 |         
330 |         # Split on Human/Assistant markers and major section headers
331 |         lines = content.split('\n')
332 |         for line in lines:
333 |             # New context indicators
334 |             if (line.startswith(('Human:', 'Assistant:', '## ', '# ')) or 
335 |                 re.match(r'^[A-Z][a-z]+ \d{1,2}, \d{4}', line)):
336 |                 if current_block:
337 |                     blocks.append('\n'.join(current_block))
338 |                     current_block = []
339 |             current_block.append(line)
340 |             
341 |         # Add final block
342 |         if current_block:
343 |             blocks.append('\n'.join(current_block))
344 |             
345 |         # Merge small blocks that are likely part of the same context
346 |         merged_blocks = []
347 |         temp_block = []
348 |         for block in blocks:
349 |             if len(block.split('\n')) < 5 and temp_block:  # Small block
350 |                 temp_block.append(block)
351 |             else:
352 |                 if temp_block:
353 |                     merged_blocks.append('\n'.join(temp_block))
354 |                     temp_block = []
355 |                 temp_block.append(block)
356 |                 
357 |         if temp_block:
358 |             merged_blocks.append('\n'.join(temp_block))
359 |             
360 |         return merged_blocks
361 | 
362 |     def _extract_timestamp(self, block: str) -> str:
363 |         """Extract timestamp from block."""
364 |         # Look for ISO format timestamps
365 |         iso_pattern = r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:[+-]\d{2}:?\d{2})?'
366 |         if match := re.search(iso_pattern, block):
367 |             return match.group(0)
368 |             
369 |         # Look for common date/time formats
370 |         patterns = [
371 |             r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}',
372 |             r'\w+ \d{1,2}, \d{4} \d{1,2}:\d{2} (?:AM|PM)',
373 |         ]
374 |         
375 |         for pattern in patterns:
376 |             if match := re.search(pattern, block):
377 |                 # Convert to ISO format
378 |                 dt = datetime.datetime.strptime(match.group(0), "%Y-%m-%d %H:%M:%S")
379 |                 return dt.isoformat()
380 |                 
381 |         # Default to current time if no timestamp found
382 |         return datetime.datetime.now().isoformat()
383 | 
384 |     def _infer_session_type(self, block: str) -> str:
385 |         """Infer the type of development session."""
386 |         # Look for explicit indicators
387 |         type_indicators = {
388 |             'DEBUG': ['error', 'bug', 'fix', 'issue', 'problem', 'traceback'],
389 |             'FEATURE': ['implement', 'add', 'create', 'new feature'],
390 |             'REFACTOR': ['refactor', 'improve', 'optimize', 'clean', 'restructure'],
391 |             'TEST': ['test', 'verify', 'validate', 'check'],
392 |             'DOCS': ['document', 'explain', 'clarify', 'readme'],
393 |             'CONFIG': ['configure', 'setup', 'install', 'environment'],
394 |         }
395 |         
396 |         block_lower = block.lower()
397 |         for session_type, indicators in type_indicators.items():
398 |             if any(ind in block_lower for ind in indicators):
399 |                 return session_type
400 |                 
401 |         # Look for code modifications
402 |         if '```' in block or 'def ' in block or 'class ' in block:
403 |             return 'CODE_MOD'
404 |             
405 |         return 'GENERAL'
406 | 
407 |     def _extract_main_issue(self, block: str) -> str:
408 |         """Extract the main issue being discussed."""
409 |         # First look for explicit issue statements
410 |         for pattern in self.issue_patterns:
411 |             if match := re.search(pattern, block, re.IGNORECASE):
412 |                 return match.group(1).strip()
413 |                 
414 |         # Look for error traces
415 |         if 'Traceback' in block or 'Error:' in block:
416 |             if error_match := re.search(r'(?:Traceback.*?|Error:)(.*?)(?=\n\w|$)', block, re.DOTALL):
417 |                 return error_match.group(1).strip()
418 |                 
419 |         # Look for task/goal statements
420 |         task_patterns = [
421 |             r'(?:need|want|trying) to\s+(.*?)(?:\.|$)',
422 |             r'(?:goal|task) is to\s+(.*?)(?:\.|$)',
423 |             r'(?:working on|implementing)\s+(.*?)(?:\.|$)'
424 |         ]
425 |         
426 |         for pattern in task_patterns:
427 |             if match := re.search(pattern, block, re.IGNORECASE):
428 |                 return match.group(1).strip()
429 |                 
430 |         return ""
431 | 
432 |     def _extract_core_problems(self, block: str) -> List[str]:
433 |         """Extract core problems identified in the discussion."""
434 |         problems = []
435 |         
436 |         # Look for explicit problem statements
437 |         for pattern in self.issue_patterns:
438 |             matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE)
439 |             for match in matches:
440 |                 problem = match.group(1).strip()
441 |                 if len(problem) > 10:  # Filter out too short matches
442 |                     problems.append(problem)
443 |                     
444 |         # Look for error traces
445 |         error_pattern = r'(?:Traceback.*?(?=\n\w|$)|Error:.*?(?=\n\w|$))'
446 |         if matches := re.finditer(error_pattern, block, re.DOTALL):
447 |             for match in matches:
448 |                 error = match.group(0).split('\n')[0]  # Get first line of traceback
449 |                 problems.append(error)
450 |                 
451 |         # Look for "needs to" statements
452 |         needs_pattern = r'needs? to (?:be )?(.*?)(?:\.|$)'
453 |         if matches := re.finditer(needs_pattern, block, re.IGNORECASE):
454 |             for match in matches:
455 |                 problems.append(f"Needs {match.group(1)}")
456 |                 
457 |         return list(set(problems))  # Remove duplicates
458 | 
459 |     def _extract_solutions(self, block: str) -> List[str]:
460 |         """Extract proposed solutions and approaches."""
461 |         solutions = []
462 |         
463 |         # Look for explicit solution patterns
464 |         for pattern in self.solution_patterns:
465 |             matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE)
466 |             for match in matches:
467 |                 solution = match.group(1).strip()
468 |                 if len(solution) > 10:  # Filter out too short matches
469 |                     solutions.append(solution)
470 |                     
471 |         # Look for code blocks that implement solutions
472 |         code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL)
473 |         for block_match in code_blocks:
474 |             code = block_match.group(1)
475 |             # Extract function/class definitions as solutions
476 |             if def_match := re.search(r'def (\w+)', code):
477 |                 solutions.append(f"Implement {def_match.group(1)} function")
478 |             if class_match := re.search(r'class (\w+)', code):
479 |                 solutions.append(f"Create {class_match.group(1)} class")
480 |                 
481 |         return list(set(solutions))  # Remove duplicates
482 | 
483 |     def _extract_file_changes(self, block: str) -> Dict[str, List[str]]:
484 |         """Extract file changes with semantic meaning."""
485 |         changes = {}
486 |         
487 |         # Look for file paths and associated changes
488 |         file_patterns = [
489 |             r'(?:in|update|modify|create|edit)\s+`?([/\w.-]+\.[/\w.-]+)`?',
490 |             r'(?:file|path):\s*`?([/\w.-]+\.[/\w.-]+)`?',
491 |             r'(?:^|\s)`?([/\w.-]+\.[/\w.-]+)`?:\s*\w+',
492 |         ]
493 |         
494 |         for pattern in file_patterns:
495 |             matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE)
496 |             for match in matches:
497 |                 file_path = match.group(1)
498 |                 if file_path not in changes:
499 |                     changes[file_path] = []
500 |                     
501 |                 # Look for associated changes in surrounding context
502 |                 context = block[max(0, match.start() - 100):min(len(block), match.end() + 100)]
503 |                 
504 |                 # Extract change types
505 |                 change_types = []
506 |                 if 'add' in context.lower() or 'create' in context.lower():
507 |                     change_types.append('added')
508 |                 if 'update' in context.lower() or 'modify' in context.lower():
509 |                     change_types.append('modified')
510 |                 if 'remove' in context.lower() or 'delete' in context.lower():
511 |                     change_types.append('removed')
512 |                 if 'fix' in context.lower():
513 |                     change_types.append('fixed')
514 |                     
515 |                 changes[file_path].extend(change_types)
516 |                 
517 |         return changes
518 | 
519 |     def _extract_dependencies(self, block: str) -> Dict[str, List[str]]:
520 |         """Extract component dependencies."""
521 |         deps = {}
522 |         
523 |         # Look for import statements in code blocks
524 |         code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL)
525 |         for block_match in code_blocks:
526 |             code = block_match.group(1)
527 |             # Extract imports
528 |             imports = re.finditer(r'(?:from|import)\s+([\w.]+)(?:\s+import\s+)?', code)
529 |             for imp in imports:
530 |                 module = imp.group(1)
531 |                 if '.' in module:
532 |                     parent = module.split('.')[0]
533 |                     if parent not in deps:
534 |                         deps[parent] = []
535 |                     deps[parent].append(module)
536 |                 else:
537 |                     if 'external' not in deps:
538 |                         deps['external'] = []
539 |                     deps['external'].append(module)
540 |                     
541 |         # Look for dependency mentions in text
542 |         dep_patterns = [
543 |             r'depends? on\s+`?([\w.]+)`?',
544 |             r'requires?\s+`?([\w.]+)`?',
545 |             r'using\s+`?([\w.]+)`?',
546 |         ]
547 |         
548 |         for pattern in dep_patterns:
549 |             matches = re.finditer(pattern, block, re.IGNORECASE)
550 |             for match in matches:
551 |                 dep = match.group(1)
552 |                 if 'mentioned' not in deps:
553 |                     deps['mentioned'] = []
554 |                 deps['mentioned'].append(dep)
555 |                 
556 |         return deps
557 | 
558 |     def _extract_reasoning_chain(self, block: str) -> List[Dict[str, Any]]:
559 |         """Extract the chain of reasoning in the discussion."""
560 |         chain = []
561 |         
562 |         # Look for cause-effect relationships
563 |         for pattern in self.reasoning_patterns:
564 |             matches = re.finditer(pattern, block, re.IGNORECASE | re.MULTILINE)
565 |             for match in matches:
566 |                 reasoning = match.group(1).strip()
567 |                 # Look for action and result in surrounding context
568 |                 context = block[max(0, match.start() - 100):min(len(block), match.end() + 100)]
569 |                 
570 |                 # Try to identify action and result
571 |                 action = ""
572 |                 if action_match := re.search(r'(?:will|should|must|going to)\s+(.*?)(?:\s+because|\s+since|$)', context):
573 |                     action = action_match.group(1)
574 |                     
575 |                 result = ""
576 |                 if result_match := re.search(r'(?:this will|resulting in|leads to)\s+(.*?)(?:\.|$)', context):
577 |                     result = result_match.group(1)
578 |                     
579 |                 if action or result:
580 |                     chain.append({
581 |                         "action": action or "unspecified",
582 |                         "reason": reasoning,
583 |                         "result": result or "unspecified"
584 |                     })
585 |                     
586 |         return chain
587 | 
588 |     def _extract_state_changes(self, block: str) -> Dict[str, Any]:
589 |         """Extract important state changes."""
590 |         changes = {}
591 |         
592 |         # Look for state change indicators
593 |         state_patterns = {
594 |             'status': r'status(?:\s+is|\s+changed\s+to)?\s+`?([\w_]+)`?',
595 |             'phase': r'phase(?:\s+is|\s+moved\s+to)?\s+`?([\w_]+)`?',
596 |             'version': r'version(?:\s+is|\s+updated\s+to)?\s+`?([\w.]+)`?',
597 |             'config': r'config(?:\s+is|\s+set\s+to)?\s+`?([\w_]+)`?',
598 |         }
599 |         
600 |         for key, pattern in state_patterns.items():
601 |             if match := re.search(pattern, block, re.IGNORECASE):
602 |                 changes[key] = match.group(1)
603 |                 
604 |         # Look for variable assignments in code
605 |         code_blocks = re.finditer(r'```.*?\n(.*?)```', block, re.DOTALL)
606 |         for block_match in code_blocks:
607 |             code = block_match.group(1)
608 |             assignments = re.finditer(r'(\w+)\s*=\s*([^;\n]+)', code)
609 |             for assign in assignments:
610 |                 var, value = assign.groups()
611 |                 if var.isupper():  # Likely a constant/config
612 |                     changes[f"code_{var.lower()}"] = value.strip()
613 |                     
614 |         return changes
615 | 
616 |     def _extract_error_context(self, block: str) -> Dict[str, List[str]]:
617 |         """Extract error context and related information."""
618 |         context = {}
619 |         
620 |         # Look for error traces
621 |         if 'Traceback' in block or 'Error:' in block:
622 |             traces = re.finditer(r'(?:Traceback.*?(?=\n\w|$)|Error:.*?(?=\n\w|$))', block, re.DOTALL)
623 |             for trace in traces:
624 |                 error_text = trace.group(0)
625 |                 
626 |                 # Extract error type
627 |                 if type_match := re.search(r'(\w+Error):', error_text):
628 |                     error_type = type_match.group(1)
629 |                     if error_type not in context:
630 |                         context[error_type] = []
631 |                         
632 |                     # Extract relevant lines
633 |                     lines = error_text.split('\n')
634 |                     context[error_type].extend([
635 |                         line.strip() for line in lines 
636 |                         if line.strip() and not line.startswith(' ')
637 |                     ])
638 |                     
639 |                     # Look for file references
640 |                     files = re.finditer(r'File "([^"]+)", line (\d+)', error_text)
641 |                     for file_match in files:
642 |                         context[error_type].append(f"In {file_match.group(1)}:{file_match.group(2)}")
643 |                         
644 |         # Look for error-related discussion
645 |         error_discussion = re.finditer(r'(?:error|issue|bug|problem).*?:\s*(.*?)(?:\n|$)', block, re.IGNORECASE)
646 |         for disc in error_discussion:
647 |             if 'discussion' not in context:
648 |                 context['discussion'] = []
649 |             context['discussion'].append(disc.group(1).strip())
650 |             
651 |         return context
652 | 
653 |     def _consolidate_problems(self, problems: List[str]) -> List[Dict[str, Any]]:
654 |         """Consolidate similar problems into higher-level issues with context."""
655 |         # First group by root cause
656 |         root_causes = {
657 |             'import': {
658 |                 'pattern': r'(?:import|from)\s+([^\s]+)',
659 |                 'problems': [],
660 |                 'affected_modules': set()
661 |             },
662 |             'initialization': {
663 |                 'pattern': r'(?:init|create|setup)\s+([^\s]+)',
664 |                 'problems': [],
665 |                 'affected_components': set()
666 |             },
667 |             'validation': {
668 |                 'pattern': r'(?:valid|schema|type)\s+([^\s]+)',
669 |                 'problems': [],
670 |                 'affected_fields': set()
671 |             },
672 |             'undefined': {
673 |                 'pattern': r'name\s+\'([^\']+)\'\s+is not defined',
674 |                 'problems': [],
675 |                 'missing_names': set()
676 |             },
677 |             'attribute': {
678 |                 'pattern': r'has no attribute\s+\'([^\']+)\'',
679 |                 'problems': [],
680 |                 'missing_attrs': set()
681 |             },
682 |             'other': {
683 |                 'pattern': None,
684 |                 'problems': [],
685 |                 'context': set()
686 |             }
687 |         }
688 |         
689 |         # Categorize each problem
690 |         for problem in problems:
691 |             problem = problem.strip()
692 |             if not problem:
693 |                 continue
694 |                 
695 |             matched = False
696 |             for category, info in root_causes.items():
697 |                 if category == 'other':
698 |                     continue
699 |                     
700 |                 if matches := re.finditer(info['pattern'], problem, re.IGNORECASE):
701 |                     for match in matches:
702 |                         matched = True
703 |                         info['problems'].append(problem)
704 |                         if category == 'import':
705 |                             info['affected_modules'].add(match.group(1))
706 |                         elif category == 'initialization':
707 |                             info['affected_components'].add(match.group(1))
708 |                         elif category == 'validation':
709 |                             info['affected_fields'].add(match.group(1))
710 |                         elif category == 'undefined':
711 |                             info['missing_names'].add(match.group(1))
712 |                         elif category == 'attribute':
713 |                             info['missing_attrs'].add(match.group(1))
714 |             
715 |             if not matched:
716 |                 root_causes['other']['problems'].append(problem)
717 |         
718 |         # Convert to final format
719 |         consolidated = []
720 |         for category, info in root_causes.items():
721 |             if not info['problems']:
722 |                 continue
723 |                 
724 |             issue = {
725 |                 'type': category,
726 |                 'summary': self._generate_summary(category, info),
727 |                 'details': list(set(info['problems'])),  # Remove duplicates
728 |                 'context': {}
729 |             }
730 |             
731 |             # Add category-specific context
732 |             if category == 'import':
733 |                 issue['context']['modules'] = list(info['affected_modules'])
734 |             elif category == 'initialization':
735 |                 issue['context']['components'] = list(info['affected_components'])
736 |             elif category == 'validation':
737 |                 issue['context']['fields'] = list(info['affected_fields'])
738 |             elif category == 'undefined':
739 |                 issue['context']['names'] = list(info['missing_names'])
740 |             elif category == 'attribute':
741 |                 issue['context']['attributes'] = list(info['missing_attrs'])
742 |             elif category == 'other':
743 |                 issue['context']['general'] = list(info['context'])
744 |             
745 |             consolidated.append(issue)
746 |             
747 |         return consolidated
748 | 
749 |     def _generate_summary(self, category: str, info: Dict[str, Any]) -> str:
750 |         """Generate a concise summary of the issue category."""
751 |         if category == 'import':
752 |             modules = ', '.join(info['affected_modules'])
753 |             return f"Import issues with modules: {modules}"
754 |         elif category == 'initialization':
755 |             components = ', '.join(info['affected_components'])
756 |             return f"Initialization issues in components: {components}"
757 |         elif category == 'validation':
758 |             fields = ', '.join(info['affected_fields'])
759 |             return f"Validation issues with fields: {fields}"
760 |         elif category == 'undefined':
761 |             names = ', '.join(info['missing_names'])
762 |             return f"Undefined names: {names}"
763 |         elif category == 'attribute':
764 |             attrs = ', '.join(info['missing_attrs'])
765 |             return f"Missing attributes: {attrs}"
766 |         else:
767 |             return "Other issues found"
768 | 
769 |     def _link_solutions_to_problems(self, problems: List[Dict[str, Any]], solutions: List[str]) -> List[Dict[str, Any]]:
770 |         """Link solutions to their corresponding problems."""
771 |         linked_problems = copy.deepcopy(problems)
772 |         
773 |         for problem in linked_problems:
774 |             problem['solutions'] = []
775 |             problem_text = ' '.join([problem['summary']] + problem['details']).lower()
776 |             
777 |             for solution in solutions:
778 |                 # Look for solutions that mention the problem's context
779 |                 if any(word.lower() in solution.lower() for word in problem['context'].get('modules', [])):
780 |                     problem['solutions'].append(solution)
781 |                 elif any(word.lower() in solution.lower() for word in problem['context'].get('components', [])):
782 |                     problem['solutions'].append(solution)
783 |                 elif any(word.lower() in solution.lower() for word in problem['context'].get('fields', [])):
784 |                     problem['solutions'].append(solution)
785 |                 elif any(word.lower() in solution.lower() for word in problem['context'].get('names', [])):
786 |                     problem['solutions'].append(solution)
787 |                 elif any(word.lower() in solution.lower() for word in problem['context'].get('attributes', [])):
788 |                     problem['solutions'].append(solution)
789 |                 # Look for solutions that mention keywords from the problem
790 |                 elif any(word in solution.lower() for word in problem_text.split()):
791 |                     problem['solutions'].append(solution)
792 |         
793 |         return linked_problems
794 | 
795 |     def _organize_discussion(self, points: List[str]) -> Dict[str, List[str]]:
796 |         """Organize discussion points into categories."""
797 |         organized = {
798 |             'analysis': [],
799 |             'changes': [],
800 |             'errors': [],
801 |             'solutions': [],
802 |             'verifications': []
803 |         }
804 |         
805 |         for point in points:
806 |             point = point.strip()
807 |             # Skip empty or uninformative points
808 |             if not point or point in ['Analyzed', 'Edited', 'Edit:', 'CopyInsert']:
809 |                 continue
810 |                 
811 |             # Categorize based on content
812 |             if any(word in point.lower() for word in ['error', 'exception', 'fail']):
813 |                 organized['errors'].append(point)
814 |             elif any(word in point.lower() for word in ['update', 'change', 'modify']):
815 |                 organized['changes'].append(point)
816 |             elif any(word in point.lower() for word in ['fix', 'solve', 'resolve']):
817 |                 organized['solutions'].append(point)
818 |             elif any(word in point.lower() for word in ['check', 'verify', 'test']):
819 |                 organized['verifications'].append(point)
820 |             else:
821 |                 organized['analysis'].append(point)
822 |         
823 |         # Remove empty categories
824 |         return {k: v for k, v in organized.items() if v}
825 | 
826 | def main():
827 |     """Main entry point for the script."""
828 |     parser = argparse.ArgumentParser(
829 |         description='Convert chat logs to SPR (Sparse Priming Representation) format',
830 |         formatter_class=argparse.RawDescriptionHelpFormatter,
831 |         epilog="""
832 | Example usage:
833 |   %(prog)s chat_history.txt
834 |   %(prog)s chat_history.txt -o custom_output.spr
835 |   %(prog)s chat_history.txt -v
836 |         """
837 |     )
838 |     parser.add_argument(
839 |         'input_file',
840 |         type=str,
841 |         help='Path to the input chat log file'
842 |     )
843 |     parser.add_argument(
844 |         '-o', '--output',
845 |         type=str,
846 |         default=None,
847 |         help='Path to the output SPR file (default: input_file_spr.txt)'
848 |     )
849 |     parser.add_argument(
850 |         '-v', '--verbose',
851 |         action='store_true',
852 |         help='Enable verbose logging'
853 |     )
854 |     parser.add_argument(
855 |         '--debug',
856 |         action='store_true',
857 |         help='Enable debug mode with additional output'
858 |     )
859 | 
860 |     args = parser.parse_args()
861 | 
862 |     # Configure logging based on arguments
863 |     if args.debug:
864 |         logging.getLogger().setLevel(logging.DEBUG)
865 |     elif args.verbose:
866 |         logging.getLogger().setLevel(logging.INFO)
867 |     else:
868 |         logging.getLogger().setLevel(logging.WARNING)
869 | 
870 |     try:
871 |         input_path = Path(args.input_file)
872 |         if not input_path.exists():
873 |             logger.error(f"Input file not found: {input_path}")
874 |             return 1
875 | 
876 |         # Determine output path
877 |         output_path = args.output
878 |         if output_path is None:
879 |             output_path = input_path.with_suffix('.spr.txt')
880 |         output_path = Path(output_path)
881 | 
882 |         # Create output directory if it doesn't exist
883 |         output_path.parent.mkdir(parents=True, exist_ok=True)
884 | 
885 |         logger.info(f"Processing {input_path}")
886 |         logger.info(f"Output will be written to {output_path}")
887 | 
888 |         # Parse and generate SPR
889 |         chat_parser = ChatLogParser(str(input_path))
890 |         chat_parser.parse_log()
891 |         spr = chat_parser.generate_spr()
892 |         
893 |         # Write output
894 |         output_path.write_text(spr)
895 |         logger.info(f"Successfully generated SPR format at {output_path}")
896 |         
897 |         if args.debug:
898 |             # Print some statistics in debug mode
899 |             logger.debug(f"Found {len(chat_parser.context_blocks)} context blocks")
900 |         
901 |         return 0
902 |             
903 |     except Exception as e:
904 |         logger.error(f"Error processing chat log: {str(e)}")
905 |         if args.debug:
906 |             logger.exception("Detailed error trace:")
907 |         return 1
908 | 
909 | if __name__ == "__main__":
910 |     exit(main())
911 | 


--------------------------------------------------------------------------------