├── LICENSE
├── README.md
├── agent
    ├── agent.py
    ├── agent_helpers.py
    ├── agents_config.yaml
    ├── multi_source.py
    ├── output
    │   ├── sample_output.json
    │   └── sample_output.txt
    └── schemas.py
└── requirements.txt


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 tegridy~~
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # abstract-agent
  2 | 
  3 | Easily extendable 100% local multi-agent system for generating novel research hypotheses, abstracts, and references. 
  4 | 
  5 | All powered by local Ollama LLMs. No API keys. No cloud. Just you, your GPU/CPU, and public sources.   
  6 | 
  7 | ---
  8 | 
  9 | ## Features
 10 | - Multi-agent pipeline: breakdown, critique, synthesize, innovate, and polish
 11 | - Pulls from public sources: arXiv, Semantic Scholar, EuropePMC, Crossref, DOAJ, bioRxiv, medRxiv, OpenAlex, PubMed
 12 | - Scores, ranks, and summarizes literature
 13 | - Uses Ollama for summarization and novelty checks
 14 | - Final output is a clean, human-readable panel with stats / insights
 15 | 
 16 | ---
 17 | 
 18 | ## Example Outputs
 19 | 
 20 | ```bash
 21 | ────────────────────────────────────────────── Pipeline 'Research Hypothesis Generation' Finished in 102.67s ───────────────────────────────────────────────
 22 | ────────────────────────────────────────────────────────────────── Final Results Summary ───────────────────────────────────────────────────────────────────
 23 | ╭────────────────────────────────────────────────────────────── Final Hypothesis Structured ───────────────────────────────────────────────────────────────╮
 24 | │ This research introduces a novel approach to Large Language Model (LLM) compression predicated on Neuro-Symbolic Contextual Compression. We propose a    │
 25 | │ system that translates LLM attention maps into a discrete, graph-based representation, subsequently employing a learned graph pruning algorithm to       │
 26 | │ remove irrelevant nodes while preserving critical semantic relationships. Unlike existing compression methods focused on direct neural manipulation,     │
 27 | │ this approach leverages the established techniques of graph pruning, offering potentially significant gains in model size and efficiency. The            │
 28 | │ integration of learned pruning, adapting to specific task and input characteristics, represents a fundamentally new paradigm for LLM compression, moving │
 29 | │ beyond purely neural optimizations.                                                                                                                      │
 30 | ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 31 | 
 32 | ╭─────────────────────────────────────────────────────────────────── Novelty Assessment ───────────────────────────────────────────────────────────────────╮
 33 | │                                                                                                                                                          │  
 34 | │                                                                                                                                                          │
 35 | │ **Novelty Score: 7/10**                                                                                                                                  │
 36 | │                                                                                                                                                          │
 37 | │ **Reasoning:**                                                                                                                                           │
 38 | │                                                                                                                                                          │
 39 | │ This hypothesis demonstrates a moderate level of novelty, primarily due to the specific combination of techniques and the integration of neuro-symbolic  │
 40 | │ approaches. Let's break down the assessment:                                                                                                             │
 41 | │                                                                                                                                                          │
 42 | │ * **Elements of Novelty (Strengths):**                                                                                                                   │
 43 | │     * **Neuro-Symbolic Contextual Compression:** The core idea of translating LLM attention maps into a discrete, graph-based representation *is* a      │
 44 | │ relatively new area of exploration. While graph pruning exists, applying it specifically to the output of LLM attention maps – and framing it within a   │
 45 | │ neuro-symbolic context – is a distinctive aspect.                                                                                                        │
 46 | │     * **Learned Graph Pruning:** The explicit mention of a *learned* graph pruning algorithm elevates the novelty. Many pruning methods are static,      │
 47 | │ whereas learning the pruning criteria based on task and input characteristics is a significant step forward.                                             │
 48 | │     * **Integration of Graph Pruning with LLMs:** While graph pruning is used in other domains, its application to LLMs, particularly in this way, is    │
 49 | │ not widely established.                                                                                                                                  │
 50 | │                                                                                                                                                          │
 51 | │ * **Elements Limiting Novelty (Weaknesses):**                                                                                                            │
 52 | │     * **Graph Pruning is Not Entirely New:** As highlighted in Paper 1, graph pruning techniques exist in general. The core concept of pruning nodes     │
 53 | │ based on importance is well-established.                                                                                                                 │
 54 | │     * **Related Work Exists:** Several papers (Papers 2, 3, 4, 5, 6, 7) address aspects of model compression, including quantization, sparsity, and      │
 55 | │ dynamic budgets.  While the *combination* is novel, the individual components are not.  Paper 7's "thinking step-by-step compression" is particularly    │
 56 | │ relevant, even though it uses a different framing (dynamic compression of reasoning steps).                                                              │
 57 | │     * **Fine-grained vs. Coarse-grained:** The hypothesis positions itself against "coarse-grained" methods (Paper 1). However, many current compression │
 58 | │ techniques are moving towards finer-grained approaches.                                                                                                  │
 59 | │                                                                                                                                                          │
 60 | │                                                                                                                                                          │
 61 | │ **Justification for the Score:**                                                                                                                         │
 62 | │                                                                                                                                                          │
 63 | │ A score of 7 reflects that the hypothesis presents a novel *approach* rather than a completely new concept. The combination of learned graph pruning     │
 64 | │ with attention maps represents a worthwhile exploration. However, it's not a revolutionary breakthrough because graph pruning itself isn’t entirely      │
 65 | │ novel, and the field is already actively investigating various compression strategies.                                                                   │
 66 | │                                                                                                                                                          │
 67 | │ **Recommendations for Strengthening the Hypothesis:**                                                                                                    │
 68 | │                                                                                                                                                          │
 69 | │ * **Quantify the Expected Gains:**  Adding specific claims about the expected reduction in model size and efficiency would strengthen the hypothesis.    │
 70 | │ * **Elaborate on the "Neuro-Symbolic" Aspect:**  Provide more detail on how the discrete graph representation represents the underlying semantic         │
 71 | │ relationships within the LLM.                                                                                                                            │
 72 | │ * **Highlight the Advantage over Existing Methods:**  Clearly articulate *why* this approach is expected to be superior to existing techniques (e.g., in │
 73 | │ terms of accuracy, speed, or ease of implementation).                                                                                                    │
 74 | ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 75 | 
 76 | ```
 77 | 
 78 | ---
 79 | 
 80 | ## Quickstart
 81 | 
 82 | 1. Clone this repo:
 83 |    ```bash
 84 |    git clone https://github.com/tegridydev/abstract-agent
 85 |    cd abstract-agent
 86 |    ```
 87 | 2. Install dependencies:
 88 |    ```bash
 89 |    pip install -r requirements.txt
 90 |    ```
 91 | 3. Install [Ollama](https://ollama.com/download) and pull a model (e.g. gemma3:4b):
 92 |    ```bash
 93 |    ollama pull gemma3:4b
 94 |    ```
 95 | 4. Run the agent:
 96 |    ```bash
 97 |    python agent.py
 98 |    ```
 99 | 
100 | ---
101 | 
102 | ## Agent Pipeline (Lego block style)
103 | - **Agent A:** Breaks down your topic into all the core pieces
104 | - **Agent B:** Roasts the literature, finds gaps and trends
105 | - **Agent C:** Synthesizes new directions
106 | - **Agent D:** Goes wild, generates bold hypotheses
107 | - **Agent E:** Polishes, references, and scores the final abstract
108 | - **Novelty Check:** Checks if it's actually new or just recycled
109 | 
110 | ---
111 | 
112 | ## Output
113 | - Final hypothesis, novelty score, references, and run stats (references searched/used, time taken)
114 | 
115 | ---
116 | 
117 | ## Dependencies
118 | - ollama
119 | - rich
120 | - arxiv
121 | - requests
122 | - xmltodict
123 | - pydantic
124 | - pyyaml
125 | 
126 | No API keys. All sources are public.
127 | 
128 | ---
129 | 
130 | ## How to modify
131 | - Edit `agents_config.yaml` to change the agent pipeline, prompts, or personas
132 | - Add new sources in `multi_source.py`
133 | 
134 | ---
135 | 
136 | ## License / Citations
137 | MIT. Use it, fork it, break it, share it. Just give a shoutout to tegridydev if you want <3
138 | 
139 | [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
140 | 
141 | **Author:** tegridydev  
142 | **Repo:** https://github.com/tegridydev/abstract-agent
143 | 
144 | 


--------------------------------------------------------------------------------
/agent/agent.py:
--------------------------------------------------------------------------------
  1 | # agent.py
  2 | # abstract-agent v1.1
  3 | # Author: tegridydev
  4 | # Repo: https://github.com/tegridydev/abstract-agent
  5 | # License: MIT
  6 | # Year: 2025
  7 | 
  8 | import os
  9 | import sys
 10 | import uuid
 11 | import importlib
 12 | import time
 13 | import json
 14 | import traceback
 15 | import re
 16 | from datetime import datetime
 17 | from typing import Dict, Any, List, Optional, Tuple
 18 | import yaml
 19 | import requests 
 20 | from rich.console import Console
 21 | from rich.prompt import Prompt, IntPrompt
 22 | from rich.table import Table
 23 | from rich.panel import Panel
 24 | from rich.box import ROUNDED
 25 | from ollama import Client, ResponseError
 26 | 
 27 | try:
 28 |     import multi_source
 29 |     import agent_helpers
 30 | except ImportError as e:
 31 |     console = Console()
 32 |     console.print(f"[bold red]Error: Failed to import required modules (multi_source, agent_helpers): {e}[/bold red]")
 33 |     console.print("Please ensure multi_source.py and agent_helpers.py exist and are in the same directory as agent.py.")
 34 |     sys.exit(1)
 35 | 
 36 | console = Console()
 37 | 
 38 | CONFIG_PATH = os.path.join(os.path.dirname(__file__), 'agents_config.yaml')
 39 | 
 40 | def load_pipeline_config() -> Dict[str, Any]:
 41 |     """Loads and validates the pipeline configuration from YAML."""
 42 |     try:
 43 |         with open(CONFIG_PATH, 'r', encoding='utf-8') as f:
 44 |             config = yaml.safe_load(f)
 45 | 
 46 |         # Basic validation
 47 |         if not isinstance(config, dict):
 48 |             raise ValueError("Config Error: Configuration file is not a valid YAML dictionary.")
 49 |         if 'steps' not in config or not isinstance(config['steps'], list):
 50 |             raise ValueError("Config Error: 'steps' list is missing or invalid in config.")
 51 |         if 'context_init' not in config or not isinstance(config['context_init'], list):
 52 |             config['context_init'] = []
 53 |             console.print("[Config] Note: 'context_init' key not found or invalid, assuming no initial context required beyond user input.", style="yellow")
 54 |         if 'final_outputs' not in config or not isinstance(config['final_outputs'], list):
 55 |              config['final_outputs'] = []
 56 |              console.print("[Config] Note: 'final_outputs' key not found or invalid, default display/save behaviour will be used.", style="yellow")
 57 | 
 58 |         for i, step in enumerate(config['steps']):
 59 |              if not isinstance(step, dict) or not step.get('name') or not step.get('type'):
 60 |                   raise ValueError(f"Config Error: Invalid step configuration at index {i}. Each step must be a dictionary with at least 'name' and 'type' keys.")
 61 | 
 62 |         return config
 63 |     except FileNotFoundError:
 64 |         console.print(f"[bold red]Error: Configuration file not found at {CONFIG_PATH}[/bold red]")
 65 |         sys.exit(1)
 66 |     except yaml.YAMLError as e:
 67 |         console.print(f"[bold red]Error: Failed to parse configuration file {CONFIG_PATH}: {e}[/bold red]")
 68 |         sys.exit(1)
 69 |     except ValueError as e:
 70 |          console.print(f"[bold red]{e}[/bold red]")
 71 |          sys.exit(1)
 72 |     except Exception as e:
 73 |          console.print(f"[bold red]An unexpected error occurred while loading config: {e}[/bold red]")
 74 |          traceback.print_exc()
 75 |          sys.exit(1)
 76 | 
 77 | 
 78 | # --- Core Agent Class  ---
 79 | class AbstractAgent:
 80 |     """Represents an LLM agent interaction point using Ollama."""
 81 |     def __init__(self, model: str = 'qwen3:0.6b', host: str = 'http://localhost:11434'):
 82 |         """
 83 |         Initializes the agent.
 84 | 
 85 |         Args:
 86 |             model: The name of the Ollama model to use (e.g., 'qwen3:0.6b').
 87 |             host: The URL of the Ollama host.
 88 |         """
 89 |         if not model:
 90 |             raise ValueError("Ollama model name cannot be empty.")
 91 |         self.model = model
 92 |         self.host = host
 93 |         try:
 94 |             self.client = Client(host=self.host)
 95 |         except Exception as e:
 96 |              raise ConnectionError(f"Failed to initialize Ollama client for host '{self.host}'. Is Ollama running? Error: {e}") from e
 97 | 
 98 |     def ask(self, prompt: str, system_prompt: Optional[str] = None, temperature: float = 0.7, timeout: int = 180) -> str:
 99 |         """
100 |         Sends a prompt to the Ollama model and returns the response content.
101 | 
102 |         Args:
103 |             prompt: The user prompt to send to the model.
104 |             system_prompt: An optional system prompt to guide the model's behavior.
105 |             temperature: The temperature setting for the Ollama model.
106 |             timeout: Timeout in seconds for the Ollama API call (Note: library support varies).
107 | 
108 |         Returns:
109 |             The content of the model's response message.
110 | 
111 |         Raises:
112 |             ConnectionError: If there's an issue communicating with the Ollama API or host.
113 |             ValueError: If prompt is empty or response is malformed.
114 |         """
115 |         if not prompt:
116 |              raise ValueError("User prompt cannot be empty.")
117 | 
118 |         messages = []
119 |         if system_prompt:
120 |             messages.append({'role': 'system', 'content': system_prompt})
121 |         messages.append({'role': 'user', 'content': prompt})
122 | 
123 |         try:
124 |             response = self.client.chat(
125 |                 model=self.model,
126 |                 messages=messages,
127 |                 options={'temperature': temperature}
128 |             )
129 | 
130 |             message = response.get('message', {})
131 |             content = message.get('content')
132 | 
133 |             if content is not None:
134 |                  return content
135 |             else:
136 |                  console.print(f"[yellow]Warning: Ollama response for model '{self.model}' missing expected content.[/yellow]")
137 |                  console.print(f"[dim]Full Response: {response}[/dim]")
138 |                  raise ValueError(f"Ollama response for model '{self.model}' did not contain expected message content.")
139 | 
140 |         except ResponseError as e:
141 |             if e.status_code == 404:
142 |                  error_message = f"Ollama API error: Model '{self.model}' not found on host '{self.host}'. Please ensure the model is pulled (e.g., `ollama pull {self.model}`)."
143 |             else:
144 |                  error_message = f"Ollama API error (Model: {self.model}): Status {e.status_code} - {e.error}"
145 |             raise ConnectionError(error_message) from e
146 |         except (requests.exceptions.ConnectionError, ConnectionRefusedError) as e:
147 |              raise ConnectionError(f"Failed to connect to Ollama host '{self.host}'. Is Ollama running and accessible at this address? Error: {e}") from e
148 |         except Exception as e:
149 |             console.print(f"[red]Unexpected error during Ollama API call:[/red] {type(e).__name__} - {e}")
150 |             raise ConnectionError(f"An unexpected error occurred communicating with Ollama host '{self.host}' for model '{self.model}'. Error: {e}") from e
151 | 
152 | 
153 | # --- Pipeline Manager ---
154 | class PipelineManager:
155 |     """Orchestrates the execution of steps defined in the pipeline config."""
156 |     def __init__(self, config: Dict[str, Any]):
157 |         """
158 |         Initializes the PipelineManager.
159 | 
160 |         Args:
161 |             config: The loaded pipeline configuration dictionary.
162 |         """
163 |         self.config = config
164 |         self.ollama_host = config.get('ollama_host', os.environ.get('OLLAMA_HOST', 'http://localhost:11434'))
165 |         console.print(f"[info]Using Ollama Host:[/info] {self.ollama_host}")
166 |         self._agent_cache: Dict[str, AbstractAgent] = {}
167 |         self.current_step_outputs: List[str] = []
168 | 
169 |     def _get_agent(self, model_name: str) -> AbstractAgent:
170 |         """Gets or creates an AbstractAgent instance for a given model, caching it."""
171 |         if model_name not in self._agent_cache:
172 |             console.print(f"[info]Initializing agent for model:[/info] {model_name}")
173 |             try:
174 |                 self._agent_cache[model_name] = AbstractAgent(model=model_name, host=self.ollama_host)
175 |             except (ValueError, ConnectionError) as e:
176 |                  console.print(f"[bold red]Fatal Error: Cannot initialize agent for model '{model_name}'. Halting.[/bold red]")
177 |                  console.print(f"[red]Details: {e}[/red]")
178 |                  raise RuntimeError(f"Failed to initialize agent '{model_name}'") from e
179 |         return self._agent_cache[model_name]
180 | 
181 |     def _format_prompt(self, template: str, context: Dict[str, Any], inputs: List[str]) -> str:
182 |         """
183 |         Safely formats a prompt template string using specified input keys from the context.
184 |         (Note: Primarily for potential non-LLM steps using templates, LLM steps format directly now).
185 |         """
186 |         format_dict = {}
187 |         missing_keys = []
188 |         for key in inputs:
189 |             if key not in context:
190 |                 missing_keys.append(key)
191 |         if missing_keys:
192 |              raise ValueError(f"Missing required context inputs for prompt formatting: {', '.join(missing_keys)}")
193 | 
194 |         for key in inputs:
195 |              format_dict[key] = context[key]
196 | 
197 |         try:
198 |             return template.format(**format_dict)
199 |         except KeyError as e:
200 |             raise ValueError(f"Prompt template expects key '{e}' which is not listed in the step's 'inputs' list.") from e
201 |         except Exception as e:
202 |             raise ValueError(f"Error formatting prompt template: {e}") from e
203 | 
204 | 
205 |     def _call_tool(self, function_path: str, context: Dict[str, Any], inputs: List[str], params: Dict[str, Any]) -> Dict[str, Any]:
206 |         """
207 |         Dynamically calls a Python function specified in the config. Maps results to output keys.
208 |         """
209 |         try:
210 |             if '.' not in function_path:
211 |                  raise ValueError(f"Invalid function path '{function_path}'. Must include module (e.g., 'module.function').")
212 |             module_path, function_name = function_path.rsplit('.', 1)
213 | 
214 |             try:
215 |                  module = importlib.import_module(module_path)
216 |             except ImportError as e:
217 |                  raise ImportError(f"Could not import module '{module_path}' specified in function path '{function_path}': {e}") from e
218 | 
219 |             if not hasattr(module, function_name):
220 |                  raise AttributeError(f"Function '{function_name}' not found in module '{module_path}'.")
221 |             func = getattr(module, function_name)
222 |             if not callable(func):
223 |                   raise TypeError(f"'{function_path}' is not a callable function.")
224 | 
225 |             func_args = {}
226 |             for key in inputs:
227 |                 if key in context:
228 |                     func_args[key] = context[key]
229 |                 else:
230 |                     raise ValueError(f"Internal Error: Missing required context input '{key}' for function '{function_path}' during tool call (should have been caught earlier).")
231 |             func_args.update(params)
232 | 
233 |             result = func(**func_args)
234 | 
235 |             output_keys = self.current_step_outputs
236 |             if not output_keys: # If step defines no outputs
237 |                  if result is not None:
238 |                       console.print(f"[yellow]Warning: Tool '{function_path}' returned a result ({type(result)}) but no output keys were defined for the step. Result ignored.[/yellow]")
239 |                  return {} # Return empty dict if no outputs expected
240 | 
241 |             if isinstance(result, tuple):
242 |                  if len(result) != len(output_keys):
243 |                       raise ValueError(f"Tool Error: Function '{function_path}' returned a tuple with {len(result)} elements, but step expected {len(output_keys)} outputs ({output_keys}).")
244 |                  return dict(zip(output_keys, result))
245 |             elif len(output_keys) == 1:
246 |                  # If expecting a single output, wrap the result in a dict
247 |                  return {output_keys[0]: result}
248 |             else:
249 |                  # If expecting multiple outputs, but got a single non-tuple value (likely an error)
250 |                  if not isinstance(result, dict):
251 |                       raise TypeError(f"Tool Error: Function '{function_path}' returned a single value of type '{type(result)}', but step expected multiple outputs ({output_keys}). Did you mean to return a tuple or dict?")
252 |                  # If result is a dict, check if it contains all expected keys
253 |                  missing_keys_in_result = [key for key in output_keys if key not in result]
254 |                  if missing_keys_in_result:
255 |                       raise ValueError(f"Tool Error: Function '{function_path}' returned a dictionary, but it's missing expected output key(s): {', '.join(missing_keys_in_result)}. Expected: {output_keys}.")
256 |                  # Return only the expected keys from the result dict
257 |                  return {key: result[key] for key in output_keys}
258 | 
259 |         except (AttributeError, TypeError, ImportError, ValueError) as e:
260 |             # Catch errors related to finding/calling the function or validating results
261 |             raise ValueError(f"Error preparing or calling function '{function_path}': {e}") from e
262 |         except Exception as e:
263 |             # Catch errors raised *during* the execution of the tool function
264 |             # Provide more context in the error message
265 |             console.print(f"[red]Error details during tool execution '{function_path}':[/red] {traceback.format_exc()}", highlight=False)
266 |             raise RuntimeError(f"Error executing tool '{function_path}'. Check logs above for details. Original error: {e}") from e
267 | 
268 |     def _check_condition(self, condition: Optional[str], context: Dict[str, Any]) -> bool:
269 |         """Evaluates a simple condition string against the context."""
270 |         if not condition:
271 |             return True # No condition means always run
272 | 
273 |         try:
274 |             # Simple check for key existence and truthiness (safer than eval)
275 |             # Handles "context.get('key')" or "context.get('key', default_value)" patterns more safely.
276 |             match = re.match(r"^\s*context\.get\(['\"]([^'\"]+)['\"](?:,\s*(.+))?\)\s*$", condition)
277 |             if match:
278 |                 key = match.group(1)
279 |                 return bool(context.get(key))
280 |             else:
281 |                  console.print(f"[yellow]Warning: Using restricted 'eval' for condition '{condition}'. Ensure config file is trusted.[/yellow]", highlight=False)
282 |                  safe_builtins = {'True': True, 'False': False, 'None': None, 'str': str, 'int': int, 'float': float, 'len': len, 'list': list, 'dict': dict}
283 |                  return eval(condition, {"__builtins__": safe_builtins}, {"context": context})
284 | 
285 |         except Exception as e:
286 |             console.print(f"[yellow]Warning: Failed to evaluate condition '{condition}': {e}. Step will be skipped.[/yellow]", highlight=False)
287 |             return False
288 | 
289 |     def run_pipeline(self, initial_context: Dict[str, Any]) -> Dict[str, Any]:
290 |         """Executes the pipeline defined in the configuration."""
291 |         start_time = time.time()
292 | 
293 |         context = initial_context.copy()
294 |         pipeline_name = self.config.get('pipeline_name', 'Untitled Pipeline')
295 |         pipeline_version = self.config.get('version', 'N/A')
296 |         console.print(Panel(f"[bold cyan]Starting Pipeline: {pipeline_name}[/bold cyan]", subtitle=f"Version {pipeline_version}"))
297 | 
298 |         if not self.config.get('steps'):
299 |              console.print("[yellow]Warning: No steps found in the pipeline configuration.[/yellow]")
300 |              return context
301 | 
302 |         for i, step in enumerate(self.config['steps']):
303 |             step_name = step['name']
304 |             step_type = step['type']
305 |             inputs = step.get('inputs', [])
306 |             outputs = step.get('outputs', [])
307 |             self.current_step_outputs = outputs
308 | 
309 |             console.rule(f"[bold yellow]Step {i+1}: {step_name} (Type: {step_type})[/bold yellow]")
310 | 
311 |             # 1. Check Condition
312 |             condition = step.get('run_if')
313 |             if not self._check_condition(condition, context):
314 |                 console.print(f"[yellow]Skipped due to condition: {condition or 'N/A'}[/yellow]\n")
315 |                 continue
316 | 
317 |             missing_inputs = [key for key in inputs if key not in context]
318 |             if missing_inputs:
319 |                 error_msg = f"Missing required inputs for step '{step_name}': {', '.join(missing_inputs)}. Halting pipeline."
320 |                 console.print(f"[bold red]Error: {error_msg}[/bold red]")
321 |                 raise ValueError(error_msg)
322 | 
323 |             step_start_time = time.time()
324 |             try:
325 |                 result_data = {}
326 |                 if step_type in ['llm_agent', 'ollama_call', 'summary_ollama', 'novelty_check_ollama']:
327 |                     # --- Handle LLM/Ollama Calls ---
328 |                     model = step.get('model')
329 |                     if not model: raise ValueError(f"Config Error: Missing 'model' definition for LLM step '{step_name}'.")
330 |                     agent = self._get_agent(model)
331 |                     prompt_template = step.get('prompt_template')
332 |                     if not prompt_template: raise ValueError(f"Config Error: Missing 'prompt_template' for LLM step '{step_name}'.")
333 | 
334 |                     system_prompt = step.get('persona') or step.get('system_prompt')
335 | 
336 |                     prompt_format_dict = {key: context[key] for key in inputs}
337 |                     if 'name' not in prompt_format_dict:
338 |                          prompt_format_dict['name'] = step_name
339 |                     if 'persona' not in prompt_format_dict and 'persona' in step:
340 |                          prompt_format_dict['persona'] = step.get('persona')
341 | 
342 |                     try:
343 |                          user_prompt = prompt_template.format(**prompt_format_dict)
344 |                     except KeyError as e:
345 |                          raise ValueError(f"Prompt Template Error: Template for step '{step_name}' expects key '{e}' which is missing from context inputs ({inputs}) and step metadata (name, persona).") from e
346 |                     except Exception as e:
347 |                          raise ValueError(f"Prompt Template Error: Failed to format prompt for step '{step_name}': {e}") from e
348 | 
349 |                     console.print(f"[magenta]Model:[/magenta] {model}")
350 |                     if system_prompt:
351 |                          console.print(Panel(system_prompt, title="System Prompt", style="dim", border_style="dim", width=100))
352 |                     console.print(Panel(user_prompt, title="User Prompt Sent", style="cyan", border_style="cyan", width=100))
353 |                     temperature = float(step.get('temperature', 0.7))
354 | 
355 |                     response_content = agent.ask(user_prompt, system_prompt=system_prompt, temperature=temperature)
356 |                     console.print(Panel(response_content, title="Response Received", style="green", border_style="green", width=100))
357 |                     if len(outputs) != 1:
358 |                          raise ValueError(f"Config Error: LLM/Ollama steps currently support exactly one output key. Step '{step_name}' expects {len(outputs)} ({outputs}).")
359 |                     result_data = {outputs[0]: response_content.strip()}
360 | 
361 |                 elif step_type in ['tool_call', 'data_fetch']:
362 |                     function_path = step.get('function')
363 |                     if not function_path: raise ValueError(f"Config Error: Missing 'function' path for tool step '{step_name}'.")
364 |                     params = step.get('params', {})
365 | 
366 |                     console.print(f"[magenta]Function:[/magenta] {function_path}")
367 |                     console.print(f"[magenta]Inputs (from context):[/magenta] {inputs}")
368 |                     console.print(f"[magenta]Params (fixed):[/magenta] {params}")
369 | 
370 |                     result_data = self._call_tool(function_path, context, inputs, params)
371 | 
372 |                     display_result = {}
373 |                     for k, v in result_data.items():
374 |                         if isinstance(v, list): display_result[k] = f"[List with {len(v)} items]"
375 |                         elif isinstance(v, dict): display_result[k] = f"[Dict with {len(v)} keys]"
376 |                         elif isinstance(v, str) and len(v) > 150: display_result[k] = v[:150] + "..."
377 |                         else: display_result[k] = v
378 |                     console.print(Panel(str(display_result), title="Tool Result Stored", style="blue", border_style="blue"))
379 | 
380 |                 else:
381 |                     error_msg = f"Config Error: Unknown step type '{step_type}' for step '{step_name}'. Halting."
382 |                     console.print(f"[bold red]Error: {error_msg}[/bold red]")
383 |                     raise ValueError(error_msg)
384 | 
385 |                 for key in outputs:
386 |                     if key not in result_data:
387 |                          console.print(f"[yellow]Warning: Step '{step_name}' of type '{step_type}' did not produce expected output key '{key}'. Setting to None.[/yellow]")
388 |                          result_data[key] = None
389 | 
390 |                 context.update(result_data)
391 | 
392 |                 step_end_time = time.time()
393 |                 console.print(f"[green]Step '{step_name}' completed successfully in {step_end_time - step_start_time:.2f}s.[/green]\n")
394 | 
395 |             except (ValueError, ConnectionError, RuntimeError, TypeError, AttributeError, ImportError) as e:
396 |                 console.print(f"[bold red]---> Pipeline Error during step '{step_name}' <---[/bold red]")
397 |                 console.print(f"[red]Error Type:[/red] {type(e).__name__}")
398 |                 console.print(f"[red]Details:[/red] {e}")
399 |                 raise
400 |             except Exception as e:
401 |                  console.print(f"[bold red]---> Unexpected Error during step '{step_name}' <---[/bold red]")
402 |                  console.print(f"[red]Error Type:[/red] {type(e).__name__} - {e}")
403 |                  console.print(f"[dim]{traceback.format_exc()}[/dim]") # Print full traceback
404 |                  raise # Re-raise to halt
405 | 
406 |         end_time = time.time()
407 |         total_time = end_time - start_time
408 |         console.rule(f"[bold green]Pipeline '{pipeline_name}' Finished in {total_time:.2f}s[/bold green]")
409 | 
410 |         return context
411 | 
412 | 
413 | # --- Output Saving & Display ---
414 | 
415 | def slugify(text: str) -> str:
416 |     """Simple slugify: keep alphanumeric, replace spaces/hyphens with underscores."""
417 |     import re
418 |     if not isinstance(text, str):
419 |          text = str(text)
420 |     text = text.lower()
421 |     text = re.sub(r'\s+', '_', text)
422 |     text = re.sub(r'-+', '_', text)
423 |     text = re.sub(r'[^\w_]+', '', text)
424 |     text = text.strip('_')
425 |     return text[:60] if text else "untitled"
426 | 
427 | 
428 | def save_results(result_context: Dict[str, Any], config: Dict[str, Any], topic: str, ollama_host: str):
429 |     """Saves the pipeline results to text and JSON files."""
430 |     output_dir = os.path.join(os.path.dirname(__file__), 'output')
431 |     if not os.path.exists(output_dir):
432 |         try:
433 |             os.makedirs(output_dir)
434 |             console.print(f"[info]Created output directory:[/info] {output_dir}")
435 |         except OSError as e:
436 |             console.print(f"[red]Error creating output directory '{output_dir}': {e}. Results will not be saved.[/red]")
437 |             return None, None # Indicate failure to save
438 | 
439 |     timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
440 |     unique_id = uuid.uuid4().hex[:8]
441 |     topic_slug = slugify(topic)
442 | 
443 |     base_filename = f"result_{topic_slug}_{timestamp}_{unique_id}"
444 |     txt_filename = f"{base_filename}.txt"
445 |     json_filename = f"{base_filename}.json"
446 |     txt_path = os.path.join(output_dir, txt_filename)
447 |     json_path = os.path.join(output_dir, json_filename)
448 | 
449 |     # --- Output TXT file ---
450 |     try:
451 |         with open(txt_path, 'w', encoding='utf-8') as f:
452 |             f.write(f"Pipeline: {config.get('pipeline_name', 'Untitled')} v{config.get('version', 'N/A')}\n")
453 |             f.write(f"Topic: {topic}\n")
454 |             f.write(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
455 |             f.write("=" * 50 + "\n\n")
456 | 
457 |             f.write("[Final Outputs]\n")
458 |             f.write("-" * 15 + "\n")
459 |             final_output_keys = config.get('final_outputs', [])
460 |             if not final_output_keys:
461 |                  default_final_keys = ['final_hypothesis_structured', 'novelty_assessment', 'top_papers']
462 |                  f.write(f"(Note: 'final_outputs' not defined in config, showing defaults: {', '.join(default_final_keys)})\n\n")
463 |                  final_output_keys = default_final_keys
464 | 
465 |             for key in final_output_keys:
466 |                  f.write(f"--- {key.replace('_', ' ').title()} ---\n")
467 |                  if key in result_context and result_context[key] is not None:
468 |                      value = result_context[key]
469 |                      if key == 'top_papers' and isinstance(value, list):
470 |                          if value:
471 |                               f.write(agent_helpers.format_citations_rich(value))
472 |                          else:
473 |                               f.write("  (No papers found or used)\n")
474 |                      elif isinstance(value, (dict, list)):
475 |                           try:
476 |                                f.write(json.dumps(value, indent=2, default=str))
477 |                           except Exception:
478 |                                f.write(str(value))
479 |                      elif isinstance(value, str):
480 |                           f.write(value)
481 |                      else:
482 |                           f.write(str(value))
483 |                      f.write("\n\n")
484 |                  else:
485 |                       f.write("  [Not Generated or Not Found in Context]\n\n")
486 | 
487 |             # Optionally include selected context keys for debugging
488 |             # debug_keys = ['literature_summary', 'breakdown', 'critical_review', 'synthesis']
489 |             # f.write("\n" + "="*50 + "\nDebugging Context:\n")
490 |             # for key in debug_keys:
491 |             #     if key in result_context:
492 |             #         f.write(f"\n--- {key.replace('_', ' ').title()} ---\n{result_context[key]}\n")
493 | 
494 |         console.print(f"[green]Text results saved to:[/green] {txt_path}")
495 |     except IOError as e:
496 |         console.print(f"[red]Error writing text file '{txt_path}': {e}[/red]")
497 |         txt_path = None
498 |     except Exception as e:
499 |          console.print(f"[red]Unexpected error writing text file '{txt_path}': {e}[/red]")
500 |          txt_path = None
501 | 
502 |     # --- Output structured JSON file ---
503 |     try:
504 |         output_data = {
505 |              'pipeline_config_snapshot': config,
506 |              'run_info': {
507 |                   'topic': topic,
508 |                   'timestamp_iso': datetime.now().isoformat(),
509 |                   'ollama_host_used': result_context.get('_pipeline_ollama_host', ollama_host)
510 |              },
511 |              'final_context': {} 
512 |         }
513 | 
514 |         serializable_context = {}
515 |         for k, v in result_context.items():
516 |             try:
517 |                 json.dumps(v)
518 |                 serializable_context[k] = v
519 |             except TypeError:
520 |                 # If direct serialization fails, convert known non-serializable types
521 |                 if isinstance(v, (datetime, uuid.UUID)):
522 |                      serializable_context[k] = str(v)
523 |                 else:
524 |                      try:
525 |                           serializable_context[k] = f"[Unserializable Type: {type(v).__name__}] {str(v)}"
526 |                      except Exception: # Catch errors during string conversion itself
527 |                           serializable_context[k] = f"[Unserializable Type: {type(v).__name__}] <Error converting to string>"
528 |             except Exception as json_err:
529 |                  serializable_context[k] = f"[Serialization Check Error for key '{k}': {json_err}]"
530 | 
531 | 
532 |         output_data['final_context'] = serializable_context
533 | 
534 |         with open(json_path, 'w', encoding='utf-8') as jf:
535 |             json.dump(output_data, jf, ensure_ascii=False, indent=2)
536 |         console.print(f"[green]JSON results saved to:[/green] {json_path}")
537 |     except IOError as e:
538 |         console.print(f"[red]Error writing JSON file '{json_path}': {e}[/red]")
539 |         json_path = None
540 |     except TypeError as e:
541 |          console.print(f"[red]Error serializing results to JSON: {e}. Check context for complex objects.[/red]")
542 |          json_path = None
543 |     except Exception as e:
544 |          console.print(f"[red]Unexpected error writing JSON file '{json_path}': {e}[/red]")
545 |          json_path = None
546 | 
547 |     return txt_path, json_path
548 | 
549 | 
550 | def display_final_results(result_context: Dict[str, Any], config: Dict[str, Any]):
551 |     """Displays key final results to the console using Rich."""
552 |     console.rule("[bold green]Final Results Summary[/bold green]")
553 | 
554 |     final_output_keys = config.get('final_outputs', [])
555 |     if not final_output_keys:
556 |         console.print("[yellow]Note: 'final_outputs' not defined in config. Displaying default summary keys.[/yellow]")
557 |         final_output_keys = ['final_hypothesis_structured', 'novelty_assessment', 'top_papers', 'all_fetched_papers_count']
558 | 
559 |     displayed_keys = set()
560 | 
561 |     # --- Display Prioritized Keys ---
562 |     
563 |     priority_display_order = ['final_hypothesis_structured', 'novelty_assessment', 'top_papers']
564 |     for key in priority_display_order:
565 |          if key in final_output_keys and key in result_context:
566 |              value = result_context[key]
567 |              title = key.replace('_', ' ').title()
568 |              if key == 'top_papers' and isinstance(value, list):
569 |                   if value:
570 |                      ref_table = Table(title="Top Papers Used (Max 10 Shown)", show_lines=True, expand=False, box=ROUNDED)
571 |                      ref_table.add_column("Title", style="bold", min_width=30, max_width=50, overflow="fold")
572 |                      ref_table.add_column("Score", justify="center", width=6)
573 |                      ref_table.add_column("Year", width=5)
574 |                      ref_table.add_column("Source", width=15)
575 |                      ref_table.add_column("URL", style="blue", no_wrap=True, overflow="ellipsis", max_width=40)
576 |                      for p in value[:10]: # Limit display to top 10
577 |                          ref_table.add_row(
578 |                              p.get('title','N/A'), f"{p.get('composite_score',0):.2f}",
579 |                              str(p.get('year','N/A')), p.get('source','N/A'), p.get('url','N/A')
580 |                          )
581 |                      console.print(ref_table)
582 |                      if len(value) > 10:
583 |                           console.print(f"[dim](Showing top 10 of {len(value)} references used)[/dim]")
584 |                   else:
585 |                      console.print(Panel("[dim]No relevant papers found or used.[/dim]", title=title, border_style="dim"))
586 |              elif value is not None:
587 |                   console.print(Panel(str(value), title=f"[bold yellow]{title}[/bold yellow]", border_style="yellow", expand=False))
588 |              displayed_keys.add(key) # Mark as displayed
589 | 
590 |     # --- Display Other Final Outputs ---
591 |     other_keys_to_display = [
592 |          k for k in final_output_keys
593 |          if k not in displayed_keys and k in result_context and result_context[k] is not None
594 |     ]
595 |     if other_keys_to_display:
596 |          console.print("\n[bold]Other Final Outputs:[/bold]")
597 |          for key in other_keys_to_display:
598 |               title = key.replace('_', ' ').title()
599 |               value = result_context[key]
600 |               console.print(Panel(str(value), title=title, border_style="dim", expand=False))
601 |               displayed_keys.add(key)
602 | 
603 |     # --- Indicate if Expected Final Outputs Were Not Generated ---
604 |     missing_final_keys = [k for k in final_output_keys if k not in result_context or result_context[k] is None]
605 |     if missing_final_keys:
606 |         console.print("\n[yellow]Note: The following expected final outputs were not generated or found:[/yellow]")
607 |         for key in missing_final_keys:
608 |             status = "[Not Found in Context]" if key not in result_context else "[Value is None]"
609 |             console.print(f"- {key.replace('_', ' ').title()} {status}")
610 | 
611 | 
612 | # --- Interactive CLI Menu ---
613 | def main_menu(manager: PipelineManager, config: Dict[str, Any]):
614 |     while True:
615 |         console.clear(home=True) # Clear screen and move cursor home
616 |         console.rule(f"[bold green]Abstract-Agent v{config.get('version', 'N/A')}: {config.get('pipeline_name', 'Config-Driven')}[/bold green]")
617 |         table = Table(show_header=True, header_style="bold magenta", box=ROUNDED)
618 |         table.add_column("No.", style="dim", width=4)
619 |         table.add_column("Action", style="bold")
620 |         table.add_row("1", "Run Research Generation Pipeline")
621 |         table.add_row("2", "Exit")
622 |         console.print(table)
623 | 
624 |         try:
625 |              choice = IntPrompt.ask("[yellow]Choose an option[/yellow]", choices=["1", "2"], default=1)
626 | 
627 |              if choice == 1:
628 |                  handle_pipeline_execution(manager, config)
629 |              elif choice == 2:
630 |                  console.print("\n[bold green]Exiting Abstract-Agent. Goodbye![/bold green]")
631 |                  sys.exit(0)
632 | 
633 |         except (KeyboardInterrupt, EOFError):
634 |             console.print("\n\n[yellow]Operation cancelled by user. Exiting.[/yellow]")
635 |             sys.exit(1)
636 | 
637 | 
638 | def handle_pipeline_execution(manager: PipelineManager, config: Dict[str, Any]):
639 |     """Handles the process of getting user input and running the pipeline."""
640 |     console.rule("Start New Pipeline Run")
641 |     default_topic = "Novel New LLM Compression Method" # Example default topic
642 |     try:
643 |          topic = Prompt.ask(
644 |               f"Enter your research topic",
645 |               default=default_topic,
646 |               show_default=True
647 |          )
648 |     except (KeyboardInterrupt, EOFError):
649 |          console.print("\n[yellow]Input cancelled. Returning to menu.[/yellow]")
650 |          return # Go back to main menu
651 | 
652 | 
653 |     initial_context = {}
654 |     required_init_keys = config.get('context_init', [])
655 |     if 'topic' in required_init_keys:
656 |          initial_context['topic'] = topic
657 |     else:
658 |          initial_context['topic'] = topic
659 | 
660 |     initial_context['_pipeline_ollama_host'] = manager.ollama_host
661 |     initial_context['_pipeline_start_time_iso'] = datetime.now().isoformat()
662 | 
663 |     console.print(Panel.fit(f"[bold]Starting pipeline for topic:[/bold] [cyan]{topic}[/cyan]", title="Processing...", border_style="blue"))
664 | 
665 |     try:
666 |         final_context = manager.run_pipeline(initial_context)
667 |         display_final_results(final_context, config)
668 |         txt_file, json_file = save_results(final_context, config, topic, manager.ollama_host)
669 |         if not txt_file and not json_file:
670 |              console.print("[yellow]Warning: Results could not be saved to file(s).[/yellow]")
671 | 
672 | 
673 |     except (ValueError, ConnectionError, RuntimeError, TypeError, AttributeError, ImportError) as e:
674 |         console.print(f"\n[bold red]---> Pipeline Execution Failed <---[/bold red]")
675 |         console.print("[yellow]Pipeline halted due to error in a step. Check logs above.[/yellow]")
676 |     except Exception as e:
677 |         console.print(f"\n[bold red]---> An Unexpected Error Occurred Post-Pipeline <---[/bold red]")
678 |         console.print(f"[red]Error Type:[/red] {type(e).__name__}")
679 |         console.print(f"[red]Details:[/red] {e}")
680 |         console.print(f"[dim]{traceback.format_exc()}[/dim]")
681 | 
682 |     # --- Wait for user before returning to menu ---
683 |     try:
684 |         Prompt.ask("\n[cyan]Press Enter to return to the main menu[/cyan]", default="")
685 |     except (KeyboardInterrupt, EOFError):
686 |          console.print("\n[yellow]Aborted. Returning to menu.[/yellow]")
687 | 
688 | 
689 | if __name__ == "__main__":
690 |     try:
691 |         pipeline_config = load_pipeline_config()
692 |         manager = PipelineManager(config=pipeline_config)
693 |         main_menu(manager, pipeline_config)
694 |     except (KeyboardInterrupt, EOFError):
695 |          print("\n\n[yellow]Application terminated by user.[/yellow]")
696 |          sys.exit(1)
697 |     except Exception as e:
698 |         error_console = Console(stderr=True)
699 |         error_console.print("\n[bold red]---> Fatal Error During Application Startup <---[/bold red]")
700 |         error_console.print(f"[red]Error Type:[/red] {type(e).__name__}")
701 |         error_console.print(f"[red]Details:[/red] {e}")
702 |         error_console.print(f"[dim]{traceback.format_exc()}[/dim]")
703 |         sys.exit(1)


--------------------------------------------------------------------------------
/agent/agent_helpers.py:
--------------------------------------------------------------------------------
  1 | # agent_helpers.py
  2 | # Helper functions for the abstract-agent pipeline
  3 | # Author: tegridydev
  4 | # Repo: https://github.com/tegridydev/abstract-agent
  5 | # License: MIT
  6 | # Year: 2025
  7 | 
  8 | from typing import List, Dict, Any
  9 | import re
 10 | 
 11 | def format_papers_for_prompt(top_papers: List[Dict[str, Any]], max_length: int = 500) -> str:
 12 |     """
 13 |     Formats a list of paper dictionaries into a string suitable for LLM prompts.
 14 |     Accepts 'top_papers' as the keyword argument for the list.
 15 | 
 16 |     Args:
 17 |         top_papers: A list of dictionaries, where each dict represents a paper.
 18 |         max_length: The maximum character length for each paper's summary in the output string.
 19 |                     Set to 0 or negative for no truncation.
 20 | 
 21 |     Returns:
 22 |         A formatted string listing the papers and their summaries.
 23 |     """
 24 |     if not top_papers:
 25 |         return "No relevant papers found or provided."
 26 | 
 27 |     formatted_string = ""
 28 |     for i, p in enumerate(top_papers):
 29 |         title = p.get('title', 'N/A')
 30 |         year = p.get('year', 'N/A')
 31 |         source = p.get('source', 'N/A')
 32 |         summary = str(p.get('summary', '')).strip()
 33 | 
 34 |         if max_length > 0 and len(summary) > max_length:
 35 |             summary = summary[:max_length] + "..."
 36 | 
 37 |         formatted_string += f"Paper {i+1}:\n"
 38 |         formatted_string += f"  Title: {title}\n"
 39 |         formatted_string += f"  Source: {source} ({year})\n"
 40 |         # formatted_string += f"  Authors: {authors}\n" # Uncomment if authors are needed
 41 |         formatted_string += f"  Summary: {summary}\n\n"
 42 | 
 43 |     return formatted_string.strip()
 44 | 
 45 | 
 46 | def format_citations_rich(citations: list) -> str:
 47 |     """
 48 |     Formats citations list into a simple string summary suitable for console display (using Rich).
 49 |     This is NOT intended for feeding back into an LLM prompt.
 50 | 
 51 |     Args:
 52 |         citations: A list of paper dictionaries.
 53 | 
 54 |     Returns:
 55 |         A formatted string summarizing the citations for display.
 56 |     """
 57 |     if not citations:
 58 |         return "No relevant papers found."
 59 | 
 60 |     summary_lines = ["Top Relevant papers found:"]
 61 |     for paper in citations:
 62 |         source = paper.get('source', 'N/A')
 63 |         title = paper.get('title', 'N/A')
 64 |         year = paper.get('year', 'N/A')
 65 |         score = paper.get('composite_score', 0)
 66 |         url = paper.get('url', 'N/A')
 67 | 
 68 |         summary_lines.append(f"- [{source}] {title} ({year})")
 69 |         summary_lines.append(f"  Score: {score:.2f} | URL: {url}")
 70 | 
 71 |     return "\n".join(summary_lines)
 72 | 
 73 | # You can add more helper functions here as needed for the pipeline
 74 | # For example, a function to parse the novelty score from text:
 75 | def calculate_novelty_score_from_text(assessment_text: str) -> float:
 76 |      """
 77 |      Parses Ollama novelty assessment text to extract a numerical score (e.g., X/10).
 78 |      Returns a float between 0.0 and 1.0, or 0.5 if parsing fails.
 79 | 
 80 |      Args:
 81 |          assessment_text: The text output from the novelty assessment LLM call.
 82 | 
 83 |      Returns:
 84 |          A float score between 0.0 and 1.0 (normalized from /10 scale) or 0.5 default.
 85 |      """
 86 |      if not isinstance(assessment_text, str):
 87 |          return 0.5
 88 | 
 89 |      match = re.search(r'(?:score|rating)\s*(?:of|is|:)?\s*(\d+(?:\.\d+)?)\s*(?:/|out\s+of\s*)10', assessment_text, re.IGNORECASE)
 90 | 
 91 |      if match:
 92 |          try:
 93 |              score = float(match.group(1))
 94 |              return max(0.0, min(1.0, score / 10.0))
 95 |          except ValueError:
 96 |              print(f"[Warning] Could not parse score value '{match.group(1)}' as float.")
 97 |              return 0.5
 98 |      else:
 99 |          print("[Warning] Novelty score pattern (e.g., 'Score: X/10') not found in assessment text.")
100 |          return 0.5


--------------------------------------------------------------------------------
/agent/agents_config.yaml:
--------------------------------------------------------------------------------
  1 | # agents_config.yaml
  2 | # Config Driven Agent Pipeline
  3 | # Author: tegridydev
  4 | # Repo: https://github.com/tegridydev/abstract-agent
  5 | # License: MIT
  6 | # Year: 2025
  7 | 
  8 | pipeline_name: Research Hypothesis Generation
  9 | version: 2.0
 10 | ollama_host: 'http://localhost:11434'
 11 | 
 12 | context_init:
 13 |   - topic
 14 | 
 15 | steps:
 16 |   - name: Literature Search
 17 |     type: tool_call
 18 |     function: multi_source.aggregate_sources
 19 |     inputs: [topic]
 20 |     outputs: [top_papers, all_fetched_papers_count]
 21 |     params:
 22 |       max_results: 7 # Number of *top* papers to return after scoring
 23 |       summarize: false
 24 | 
 25 |   - name: Format Paper Abstracts for Prompt
 26 |     type: tool_call
 27 |     function: agent_helpers.format_papers_for_prompt
 28 |     inputs: [top_papers]
 29 |     outputs: [top_papers_abstracts_formatted]
 30 |     params:
 31 |       max_length: 500
 32 | 
 33 |   - name: Summarize Literature
 34 |     type: ollama_call
 35 |     model: gemma3:4b
 36 |     inputs: [topic, top_papers_abstracts_formatted]
 37 |     outputs: [literature_summary]
 38 |     prompt_template: |
 39 |       You are a research assistant. Summarize the key findings, trends, and gaps based on the following abstracts related to the topic '{topic}'. Be concise and insightful.
 40 | 
 41 |       Abstracts:
 42 |       {top_papers_abstracts_formatted}
 43 | 
 44 |   - name: Agent A - Breakdown
 45 |     type: llm_agent
 46 |     model: gemma3:4b
 47 |     persona: "Curious research assistant with encyclopedic knowledge and a drive to break down complex topics."
 48 |     inputs: [topic]
 49 |     outputs: [breakdown]
 50 |     prompt_template: |
 51 |       You are a {persona}. Your task is to break down the user's research topic into its core components. List all relevant facts, subtopics, and prior knowledge. Identify any foundational concepts or assumptions. Push yourself to be thorough and creative.
 52 | 
 53 |       Topic: {topic}
 54 | 
 55 |   - name: Agent B - Critique
 56 |     type: llm_agent
 57 |     model: gemma3:4b
 58 |     persona: "Skeptical literature reviewer who excels at critical analysis and finding gaps in current research."
 59 |     inputs: [topic, breakdown, literature_summary]
 60 |     outputs: [critical_review]
 61 |     prompt_template: |
 62 |       You are a {persona}. Review the following topic breakdown and the multi-source literature summary. Identify gaps, contradictions, or unexplored areas. Critique the current state of research, highlight hot trends, and flag anything that seems over- or under-explored. Be brutally honest and push for novelty.
 63 | 
 64 |       Topic: {topic}
 65 | 
 66 |       Breakdown:
 67 |       {breakdown}
 68 | 
 69 |       Literature Summary:
 70 |       {literature_summary}
 71 | 
 72 |   - name: Agent C - Synthesis
 73 |     type: llm_agent
 74 |     model: gemma3:4b
 75 |     persona: "Methodical synthesizer who integrates feedback and generates new connections."
 76 |     inputs: [topic, breakdown, critical_review]
 77 |     outputs: [synthesis]
 78 |     prompt_template: |
 79 |       You are a {persona}. Synthesize the breakdown and the critical review. Propose a refined research direction, combining the most promising ideas and addressing the gaps or contradictions. Think outside the box and suggest connections that aren't obvious.
 80 | 
 81 |       Topic: {topic}
 82 | 
 83 |       Breakdown:
 84 |       {breakdown}
 85 | 
 86 |       Critical Review:
 87 |       {critical_review}
 88 | 
 89 |   - name: Agent D - Novelty Generation
 90 |     type: llm_agent
 91 |     model: gemma3:4b
 92 |     persona: "Innovative thinker and wild idea generator who is not afraid to propose bold, unconventional hypotheses."
 93 |     inputs: [topic, synthesis]
 94 |     outputs: [novel_hypothesis]
 95 |     prompt_template: |
 96 |       You are {persona}. Take the refined direction based on the synthesis provided below. Propose ONLY ONE truly novel hypothesis that challenges assumptions or explores new territory regarding the topic '{topic}'.
 97 |       Justify why this hypothesis is novel and potentially impactful in a separate section titled 'Rationale for Novelty:'.
 98 |       Do NOT include your thinking process or any conversational filler. Output ONLY the 'Hypothesis:' section and the 'Rationale for Novelty:' section.
 99 | 
100 |       [Synthesis Provided]
101 |       {synthesis}
102 | 
103 |       [Your Output]
104 |       Hypothesis:
105 |       <Your novel hypothesis here>
106 | 
107 |       Rationale for Novelty:
108 |       <Your justification here>
109 | 
110 |   - name: Agent E - Academic Structuring
111 |     type: llm_agent
112 |     model: gemma3:4b
113 |     persona: "Formal academic who polishes, structures, and references the final hypothesis for publication."
114 |     inputs: [topic, novel_hypothesis, top_papers_abstracts_formatted]
115 |     outputs: [final_hypothesis_structured]
116 |     prompt_template: |
117 |       You are a {persona}.
118 |       Your task is to rewrite the provided 'Novel Hypothesis Text' into a concise, single-paragraph, publication-ready abstract for the topic '{topic}'.
119 |       Focus on clarity, impact, and formal academic tone. The core novelty must be highlighted.
120 |       Use the 'Literature Context' ONLY for background understanding and implicit referencing; DO NOT summarize the literature context in your output.
121 |       Output ONLY the final abstract text, without any preamble or explanation.
122 | 
123 |       [Novel Hypothesis Text to Format]
124 |       {novel_hypothesis}
125 | 
126 |       [Literature Context - for background ONLY]
127 |       {top_papers_abstracts_formatted}
128 | 
129 |       [Formatted Abstract Output]
130 | 
131 |   - name: Novelty Assessment
132 |     type: ollama_call
133 |     model: gemma3:4b
134 |     inputs: [final_hypothesis_structured, top_papers_abstracts_formatted]
135 |     outputs: [novelty_assessment]
136 |     prompt_template: |
137 |       You are an expert research evaluator. Assess how novel the proposed hypothesis is compared to the provided abstracts of related papers. Explain your reasoning. Score the novelty on a scale of 1 (not novel) to 10 (highly novel).
138 | 
139 |       [Hypothesis]
140 |       {final_hypothesis_structured}
141 | 
142 |       [Related Paper Abstracts]
143 |       {top_papers_abstracts_formatted}
144 |     run_if: "context.get('final_hypothesis_structured')"
145 | 
146 | final_outputs:
147 |   - final_hypothesis_structured
148 |   - novelty_assessment
149 |   - top_papers
150 |   - all_fetched_papers_count


--------------------------------------------------------------------------------
/agent/multi_source.py:
--------------------------------------------------------------------------------
  1 | # multi_source.py
  2 | # abstract-agent
  3 | # Author: tegridydev
  4 | # Repo: https://github.com/tegridydev/abstract-agent
  5 | # License: MIT
  6 | # Year: 2025
  7 | 
  8 | import os
  9 | import datetime
 10 | from typing import List, Tuple, Dict, Any
 11 | import arxiv
 12 | import requests
 13 | import xmltodict
 14 | from ollama import Client
 15 | 
 16 | OLLAMA_HOST = os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
 17 | 
 18 | def fetch_semanticscholar(topic, max_results=3) -> List[Dict[str, Any]]:
 19 |     """Fetches papers from Semantic Scholar API."""
 20 |     url = f"https://api.semanticscholar.org/graph/v1/paper/search?query={topic}&limit={max_results}&fields=title,authors,year,abstract,url,citationCount" # Added citationCount
 21 |     results = []
 22 |     try:
 23 |         resp = requests.get(url, timeout=10)
 24 |         resp.raise_for_status()
 25 |         data = resp.json()
 26 |         for paper in data.get('data', []):
 27 |             if paper.get('title') and paper.get('abstract'):
 28 |                 results.append({
 29 |                     'title': paper.get('title'),
 30 |                     'authors': ', '.join([a['name'] for a in paper.get('authors', []) if a.get('name')]),
 31 |                     'year': paper.get('year'),
 32 |                     'summary': paper.get('abstract', ''),
 33 |                     'url': paper.get('url'),
 34 |                     'citations': paper.get('citationCount', 0), # Get citation count
 35 |                     'source': 'Semantic Scholar',
 36 |                 })
 37 |     except requests.exceptions.RequestException as e:
 38 |         print(f"[Error] Semantic Scholar request failed: {e}")
 39 |     except Exception as e:
 40 |         print(f"[Error] Semantic Scholar processing failed: {e}")
 41 |     return results
 42 | 
 43 | def fetch_europepmc(topic, max_results=3) -> List[Dict[str, Any]]:
 44 |     """Fetches papers from Europe PMC API."""
 45 |     url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/search?query={topic}&format=json&pageSize={max_results}&resultType=core" # Ensure resultType=core for citations
 46 |     results = []
 47 |     try:
 48 |         resp = requests.get(url, timeout=10)
 49 |         resp.raise_for_status()
 50 |         data = resp.json()
 51 |         for paper in data.get('resultList', {}).get('result', []):
 52 |             if paper.get('title') and paper.get('abstractText'):
 53 |                 results.append({
 54 |                     'title': paper.get('title'),
 55 |                     'authors': paper.get('authorString', ''),
 56 |                     'year': paper.get('pubYear', ''),
 57 |                     'summary': paper.get('abstractText', ''),
 58 |                     'url': paper.get('doi', '') and f"https://doi.org/{paper['doi']}" or paper.get('fullTextUrlList', {}).get('fullTextUrl', [{}])[0].get('url', ''),
 59 |                     'citations': paper.get('citedByCount', 0), # Get citation count
 60 |                     'source': 'EuropePMC',
 61 |                 })
 62 |     except requests.exceptions.RequestException as e:
 63 |         print(f"[Error] EuropePMC request failed: {e}")
 64 |     except Exception as e:
 65 |         print(f"[Error] EuropePMC processing failed: {e}")
 66 |     return results
 67 | 
 68 | def fetch_crossref(topic, max_results=3) -> List[Dict[str, Any]]:
 69 |     """Fetches papers from Crossref API."""
 70 |     url = f"https://api.crossref.org/works?query={topic}&rows={max_results}&filter=has-abstract:true" # Filter for abstracts
 71 |     results = []
 72 |     try:
 73 |         resp = requests.get(url, timeout=10)
 74 |         resp.raise_for_status()
 75 |         data = resp.json()
 76 |         for item in data.get('message', {}).get('items', []):
 77 |             abstract = item.get('abstract', '')
 78 |             if isinstance(abstract, str) and abstract.strip().startswith('<'):
 79 |                 try:
 80 |                    import re
 81 |                    abstract = re.sub('<[^>]*>', '', abstract).strip()
 82 |                 except Exception:
 83 |                     pass # Keep original if regex fails somehow
 84 |             elif isinstance(abstract, list):
 85 |                 abstract = ' '.join(str(a) for a in abstract)
 86 |             elif not isinstance(abstract, str):
 87 |                 abstract = str(abstract)
 88 | 
 89 |             title_list = item.get('title', [])
 90 |             title = title_list[0] if title_list else None
 91 |             if title and abstract:
 92 |                 results.append({
 93 |                     'title': title,
 94 |                     'authors': ', '.join([f"{a.get('given','')} {a.get('family','')}".strip() for a in item.get('author', []) if isinstance(a, dict)]) if 'author' in item else '',
 95 |                     'year': item.get('published-print', {}).get('date-parts', [[None]])[0][0] or item.get('created', {}).get('date-parts', [[None]])[0][0],
 96 |                     'summary': abstract,
 97 |                     'url': item.get('URL', ''),
 98 |                     'citations': item.get('is-referenced-by-count', 0), # Get citation count
 99 |                     'source': 'Crossref',
100 |                 })
101 |     except requests.exceptions.RequestException as e:
102 |         print(f"[Error] Crossref request failed: {e}")
103 |     except Exception as e:
104 |         print(f"[Error] Crossref processing failed: {e}")
105 |     return results
106 | 
107 | def fetch_doaj(topic, max_results=3) -> List[Dict[str, Any]]:
108 |     """Fetches papers from DOAJ API."""
109 |     url = f"https://doaj.org/api/v2/search/articles/{topic}?page=1&pageSize={max_results}"
110 |     results = []
111 |     try:
112 |         resp = requests.get(url, timeout=10)
113 |         resp.raise_for_status()
114 |         data = resp.json()
115 |         for item in data.get('results', []):
116 |             bib = item.get('bibjson', {})
117 |             if bib.get('title') and bib.get('abstract'):
118 |                 results.append({
119 |                     'title': bib.get('title', ''),
120 |                     'authors': ', '.join([a.get('name', '') for a in bib.get('author', []) if isinstance(a, dict)]),
121 |                     'year': bib.get('year', ''),
122 |                     'summary': bib.get('abstract', ''),
123 |                     'url': next((link.get('url') for link in bib.get('link', []) if link.get('type') == 'fulltext'), bib.get('link', [{}])[0].get('url', '')), # Prefer fulltext link
124 |                     'citations': 0, # DOAJ API doesn't typically return citation counts
125 |                     'source': 'DOAJ',
126 |                 })
127 |     except requests.exceptions.RequestException as e:
128 |         print(f"[Error] DOAJ request failed: {e}")
129 |     except Exception as e:
130 |         print(f"[Error] DOAJ processing failed: {e}")
131 |     return results
132 | 
133 | def fetch_biorxiv(topic, max_results=3) -> List[Dict[str, Any]]:
134 |      """Placeholder for bioRxiv/medRxiv fetch. Currently unreliable."""
135 |      # Note: bioRxiv API is tricky for arbitrary topic searches. The previous example used details endpoint structure incorrectly.
136 |      # A proper implementation might require specific API endpoints or libraries not used here.
137 |      # For now, returning empty to avoid misleading results or errors from a likely incorrect URL structure/method.
138 |      print(f"[Warning] bioRxiv/medRxiv search implementation is currently disabled due to API limitations/complexity.")
139 |      return []
140 | 
141 | def fetch_pubmed(topic, max_results=3) -> List[Dict[str, Any]]:
142 |     """Fetches papers from PubMed API using E-utilities."""
143 |     try:
144 |         # ESearch to get PMIDs
145 |         search_url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax={max_results}&term={topic}&retmode=json"
146 |         search_resp = requests.get(search_url, timeout=10)
147 |         search_resp.raise_for_status() # Raise error for bad responses
148 |         id_list = search_resp.json().get('esearchresult', {}).get('idlist', [])
149 |         if not id_list:
150 |             return []
151 | 
152 |         ids = ','.join(id_list)
153 |         fetch_url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id={ids}&retmode=xml"
154 |         fetch_resp = requests.get(fetch_url, timeout=15)
155 |         fetch_resp.raise_for_status()
156 | 
157 |         results = []
158 |         docs = xmltodict.parse(fetch_resp.text)
159 |         articles_data = docs.get('PubmedArticleSet', {})
160 |         if not articles_data:
161 |             print(f"[Warning] PubMed EFetch returned unexpected XML structure. Top keys: {list(docs.keys())}")
162 |             return []
163 |         articles = articles_data.get('PubmedArticle', [])
164 |         if not isinstance(articles, list):
165 |             articles = [articles] if articles else []
166 | 
167 |         for art in articles:
168 |             medline_citation = art.get('MedlineCitation', {})
169 |             if not medline_citation: continue
170 |             pmid_obj = medline_citation.get('PMID')
171 |             pmid = pmid_obj.get('#text') if isinstance(pmid_obj, dict) else pmid_obj
172 |             if not pmid: continue
173 | 
174 |             article_info = medline_citation.get('Article', {})
175 |             if not article_info: continue
176 | 
177 |             title_obj = article_info.get('ArticleTitle', '')
178 |             title = title_obj.get('#text', '') if isinstance(title_obj, dict) else str(title_obj)
179 | 
180 |             authors_list = article_info.get('AuthorList', {}).get('Author', [])
181 |             if authors_list and not isinstance(authors_list, list):
182 |                  authors_list = [authors_list]
183 |             authors = ', '.join(
184 |                 f"{a.get('LastName', '')} {a.get('Initials', '')}".strip()
185 |                 for a in authors_list if isinstance(a, dict) and a.get('LastName')
186 |             )
187 | 
188 |             pub_date = article_info.get('Journal', {}).get('JournalIssue', {}).get('PubDate', {})
189 |             year_obj = pub_date.get('Year', '')
190 |             year = year_obj.get('#text') if isinstance(year_obj, dict) else str(year_obj)
191 |             if not year:
192 |                  medline_date = pub_date.get('MedlineDate', '')
193 |                  if isinstance(medline_date, str):
194 |                       year = medline_date.split(' ')[0]
195 | 
196 |             abstract_section = article_info.get('Abstract', {})
197 |             if not abstract_section:
198 |                  continue
199 |             abstract_text_obj = abstract_section.get('AbstractText', '')
200 |             abstract = ''
201 |             if isinstance(abstract_text_obj, list):
202 |                 abstract_list = []
203 |                 for part in abstract_text_obj:
204 |                     if isinstance(part, str):
205 |                         abstract_list.append(part)
206 |                     elif isinstance(part, dict):
207 |                         label = part.get('@Label', '')
208 |                         text = part.get('#text', '')
209 |                         if label and text:
210 |                             abstract_list.append(f"{label}: {text}")
211 |                         elif text:
212 |                             abstract_list.append(text)
213 |                 abstract = ' '.join(abstract_list).strip()
214 |             elif isinstance(abstract_text_obj, dict):
215 |                 abstract = abstract_text_obj.get('#text', '')
216 |             elif isinstance(abstract_text_obj, str):
217 |                  abstract = abstract_text_obj
218 |             elif abstract_text_obj is not None:
219 |                  abstract = str(abstract_text_obj)
220 |             if title and abstract:
221 |                 results.append({
222 |                     'title': title,
223 |                     'authors': authors,
224 |                     'year': year,
225 |                     'summary': abstract.strip(),
226 |                     'url': f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/",
227 |                     'citations': 0,
228 |                     'source': 'PubMed',
229 |                 })
230 |         return results
231 |     except requests.exceptions.RequestException as e:
232 |         print(f"[Error] PubMed request failed: {e}")
233 |         return []
234 |     except Exception as e:
235 |         print(f"[Error] PubMed parsing failed: {e}")
236 |         import traceback
237 |         print(traceback.format_exc())
238 |         return []
239 | 
240 | 
241 | def fetch_openalex(topic, max_results=3) -> List[Dict[str, Any]]:
242 |     """Fetches papers from OpenAlex API."""
243 |     my_email = os.environ.get("OPENALEX_EMAIL", "anonymous@example.com")
244 |     headers = {'User-Agent': f'AbstractAgent/1.0 (mailto:{my_email})'}
245 |     url = f"https://api.openalex.org/works?search={topic}&per-page={max_results}&filter=has_abstract:true"
246 |     results = []
247 |     try:
248 |         resp = requests.get(url, headers=headers, timeout=10)
249 |         resp.raise_for_status()
250 |         data = resp.json()
251 |         for paper in data.get('results', []):
252 |             summary = ''
253 |             abs_inv = paper.get('abstract_inverted_index')
254 |             if isinstance(abs_inv, dict):
255 |                 try:
256 |                     if not abs_inv:
257 |                          summary = ""
258 |                     else:
259 |                          max_len = max(max(positions) for positions in abs_inv.values() if positions)
260 |                          ordered_words = [''] * (max_len + 1)
261 |                          for word, positions in abs_inv.items():
262 |                              if positions:
263 |                                  for pos in positions:
264 |                                      if 0 <= pos <= max_len:
265 |                                          ordered_words[pos] = word
266 |                          summary = ' '.join(filter(None, ordered_words))
267 |                 except (ValueError, TypeError) as e:
268 |                     print(f"[Warning] OpenAlex abstract reconstruction failed for paper '{paper.get('id')}': {e}")
269 |                     summary = "[Abstract not available or invalid format]"
270 |             elif isinstance(abs_inv, str):
271 |                  summary = abs_inv
272 |             title = paper.get('title')
273 |             if title and summary:
274 |                  results.append({
275 |                      'title': title,
276 |                      'authors': ', '.join([a['author']['display_name'] for a in paper.get('authorships', []) if a.get('author') and a['author'].get('display_name')]),
277 |                      'year': paper.get('publication_year'),
278 |                      'summary': summary,
279 |                      'url': paper.get('doi') or paper.get('id'),
280 |                      'citations': paper.get('cited_by_count', 0),
281 |                      'source': 'OpenAlex',
282 |                  })
283 |     except requests.exceptions.RequestException as e:
284 |         print(f"[Error] OpenAlex request failed: {e}")
285 |     except Exception as e:
286 |         print(f"[Error] OpenAlex processing failed: {e}")
287 |     return results
288 | 
289 | 
290 | def fetch_arxiv(topic, max_results=3) -> List[Dict[str, Any]]:
291 |     """Fetches papers from arXiv API using the official library."""
292 |     results = []
293 |     try:
294 |         client = arxiv.Client(
295 |              page_size = max_results,
296 |              delay_seconds = 3,
297 |              num_retries = 3
298 |         )
299 |         search = arxiv.Search(
300 |             query=topic,
301 |             max_results=max_results,
302 |             sort_by=arxiv.SortCriterion.Relevance
303 |         )
304 |         for paper in client.results(search):
305 |             if paper.title and paper.summary:
306 |                  results.append({
307 |                      'title': paper.title,
308 |                      'authors': ', '.join([a.name for a in paper.authors if a.name]),
309 |                      'year': paper.published.year if paper.published else None,
310 |                      'summary': paper.summary.strip(),
311 |                      'url': paper.entry_id,
312 |                      'citations': 0,
313 |                      'source': 'arXiv',
314 |                  })
315 |             if len(results) >= max_results:
316 |                  break
317 |     except Exception as e:
318 |         print(f"[Error] arXiv fetch failed: {e}")
319 |     return results
320 | 
321 | 
322 | # --- Aggregation Logic ---
323 | 
324 | def normalize_year(year: Any) -> float:
325 |     """Normalizes year to a recency score (0.0 to 1.0)."""
326 |     try:
327 |         year_int = int(year)
328 |         this_year = datetime.datetime.now().year
329 |         age = max(0, this_year - year_int)
330 |         recency_score = max(0.0, 1.0 - min(age, 20) / 20.0)
331 |         return recency_score
332 |     except (ValueError, TypeError, AttributeError):
333 |         return 0.5
334 | 
335 | 
336 | def normalize_citations(cites: Any, max_cites: int) -> float:
337 |     """Normalizes citation count to a score (0.0 to 1.0) based on max found."""
338 |     try:
339 |         cites_int = int(cites)
340 |         if max_cites <= 0:
341 |             return 0.0
342 |         return min(1.0, cites_int / max_cites)
343 |     except (ValueError, TypeError):
344 |         return 0.0
345 | 
346 | 
347 | def simple_relevance_score(topic: str, title: str, summary: str) -> float:
348 |     """Calculates a basic relevance score based on keyword overlap."""
349 |     try:
350 |         topic_words = set(filter(None, topic.lower().split()))
351 |         text = (str(title) or '') + ' ' + (str(summary) or '')
352 |         text_words = set(filter(None, text.lower().split()))
353 |         if not topic_words:
354 |              return 0.0
355 |         overlap = len(topic_words.intersection(text_words))
356 |         return overlap / len(topic_words)
357 |     except Exception as e:
358 |         print(f"[Warning] Failed to calculate relevance score: {e}")
359 |         return 0.0
360 | 
361 | 
362 | def aggregate_sources(topic: str, max_results: int = 5, **kwargs) -> Tuple[List[Dict[str, Any]], int]:
363 |     """
364 |     Fetches papers from all sources, deduplicates, scores, ranks,
365 |     and returns the top N papers and the total count before deduplication.
366 | 
367 |     Args:
368 |         topic: The research topic query.
369 |         max_results: The maximum number of *top* papers to return after ranking.
370 |         **kwargs: Catches unused parameters from potential older calls or config.
371 | 
372 |     Returns:
373 |         A tuple containing:
374 |         - list: Top N ranked papers (dictionaries).
375 |         - int: Total number of papers fetched across all sources before deduplication.
376 |     """
377 |     all_fetched_results: List[Dict[str, Any]] = []
378 |     fetch_functions = [
379 |         fetch_arxiv, fetch_semanticscholar, fetch_pubmed, fetch_openalex,
380 |         fetch_europepmc, fetch_crossref, fetch_doaj
381 |     ]
382 |     for fetch_func in fetch_functions:
383 |         print(f"Fetching from {fetch_func.__name__}...")
384 |         try:
385 |             results = fetch_func(topic, max_results=max_results + 5)
386 |             print(f" -> Found {len(results)} results from {fetch_func.__name__}.")
387 |             if results:
388 |                 all_fetched_results.extend(results)
389 |         except Exception as e:
390 |             print(f"[Warning] Failed to fetch or process from {fetch_func.__name__}: {e}")
391 | 
392 |     total_fetched_count = len(all_fetched_results)
393 |     print(f"\nTotal fetched before deduplication: {total_fetched_count}")
394 | 
395 |     seen_identifiers = set()
396 |     deduped = []
397 |     for item in all_fetched_results:
398 |         title = item.get('title','').lower().strip()
399 |         import re
400 |         title_norm = re.sub(r'[^\w\s]', '', title)
401 |         title_norm = ' '.join(title_norm.split())
402 | 
403 |         url = item.get('url', '').lower().strip()
404 |         identifier = url if 'doi.org' in url else title_norm
405 | 
406 |         if identifier and identifier not in seen_identifiers:
407 |             if item.get('title') and len(str(item.get('summary', ''))) > 50:
408 |                  item['citations'] = item.get('citations', 0)
409 |                  item['year'] = item.get('year', None)
410 | 
411 |                  deduped.append(item)
412 |                  seen_identifiers.add(identifier)
413 | 
414 |     print(f"Total after deduplication and quality filtering: {len(deduped)}")
415 | 
416 |     max_cites = 0
417 |     for paper in deduped:
418 |         try:
419 |             cites = int(paper.get('citations', 0))
420 |             paper['citations'] = cites
421 |             if cites > max_cites:
422 |                 max_cites = cites
423 |         except (ValueError, TypeError):
424 |             paper['citations'] = 0
425 | 
426 |     print(f"Maximum citations found in batch: {max_cites}")
427 | 
428 |     scored_papers = []
429 |     for paper in deduped:
430 |         rel = simple_relevance_score(topic, paper.get('title',''), paper.get('summary',''))
431 |         year_score = normalize_year(paper.get('year'))
432 |         cite_score = normalize_citations(paper.get('citations', 0), max_cites)
433 | 
434 |         score = (0.5 * rel) + (0.3 * year_score) + (0.2 * cite_score)
435 | 
436 |         paper['relevance_score'] = round(rel, 3)
437 |         paper['recency_score'] = round(year_score, 3)
438 |         paper['citation_score'] = round(cite_score, 3)
439 |         paper['composite_score'] = round(score, 3)
440 |         scored_papers.append(paper)
441 | 
442 |     scored_papers.sort(key=lambda x: x['composite_score'], reverse=True)
443 | 
444 |     top_papers = scored_papers[:max_results]
445 |     print(f"Returning top {len(top_papers)} papers.")
446 | 
447 |     return top_papers, total_fetched_count


--------------------------------------------------------------------------------
/agent/output/sample_output.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "pipeline_config_snapshot": {
  3 |       "pipeline_name": "Research Hypothesis Generation",
  4 |       "version": 1.1,
  5 |       "ollama_host": "http://localhost:11434",
  6 |       "context_init": [
  7 |         "topic"
  8 |       ],
  9 |       "steps": [
 10 |         {
 11 |           "name": "Literature Search",
 12 |           "type": "tool_call",
 13 |           "function": "multi_source.aggregate_sources",
 14 |           "inputs": [
 15 |             "topic"
 16 |           ],
 17 |           "outputs": [
 18 |             "top_papers",
 19 |             "all_fetched_papers_count"
 20 |           ],
 21 |           "params": {
 22 |             "max_results": 7,
 23 |             "summarize": false
 24 |           }
 25 |         },
 26 |         {
 27 |           "name": "Format Paper Abstracts for Prompt",
 28 |           "type": "tool_call",
 29 |           "function": "agent_helpers.format_papers_for_prompt",
 30 |           "inputs": [
 31 |             "top_papers"
 32 |           ],
 33 |           "outputs": [
 34 |             "top_papers_abstracts_formatted"
 35 |           ],
 36 |           "params": {
 37 |             "max_length": 500
 38 |           }
 39 |         },
 40 |         {
 41 |           "name": "Summarize Literature",
 42 |           "type": "ollama_call",
 43 |           "model": "gemma3:4b",
 44 |           "inputs": [
 45 |             "topic",
 46 |             "top_papers_abstracts_formatted"
 47 |           ],
 48 |           "outputs": [
 49 |             "literature_summary"
 50 |           ],
 51 |           "prompt_template": "You are a research assistant. Summarize the key findings, trends, and gaps based on the following abstracts related to the topic '{topic}'. Be concise and insightful.\n\nAbstracts:\n{top_papers_abstracts_formatted}\n"
 52 |         },
 53 |         {
 54 |           "name": "Agent A - Breakdown",
 55 |           "type": "llm_agent",
 56 |           "model": "gemma3:4b",
 57 |           "persona": "Curious research assistant with encyclopedic knowledge and a drive to break down complex topics.",
 58 |           "inputs": [
 59 |             "topic"
 60 |           ],
 61 |           "outputs": [
 62 |             "breakdown"
 63 |           ],
 64 |           "prompt_template": "You are a {persona}. Your task is to break down the user's research topic into its core components. List all relevant facts, subtopics, and prior knowledge. Identify any foundational concepts or assumptions. Push yourself to be thorough and creative.\n\nTopic: {topic}\n"
 65 |         },
 66 |         {
 67 |           "name": "Agent B - Critique",
 68 |           "type": "llm_agent",
 69 |           "model": "gemma3:4b",
 70 |           "persona": "Skeptical literature reviewer who excels at critical analysis and finding gaps in current research.",
 71 |           "inputs": [
 72 |             "topic",
 73 |             "breakdown",
 74 |             "literature_summary"
 75 |           ],
 76 |           "outputs": [
 77 |             "critical_review"
 78 |           ],
 79 |           "prompt_template": "You are a {persona}. Review the following topic breakdown and the multi-source literature summary. Identify gaps, contradictions, or unexplored areas. Critique the current state of research, highlight hot trends, and flag anything that seems over- or under-explored. Be brutally honest and push for novelty.\n\nTopic: {topic}\n\nBreakdown:\n{breakdown}\n\nLiterature Summary:\n{literature_summary}\n"
 80 |         },
 81 |         {
 82 |           "name": "Agent C - Synthesis",
 83 |           "type": "llm_agent",
 84 |           "model": "gemma3:4b",
 85 |           "persona": "Methodical synthesizer who integrates feedback and generates new connections.",
 86 |           "inputs": [
 87 |             "topic",
 88 |             "breakdown",
 89 |             "critical_review"
 90 |           ],
 91 |           "outputs": [
 92 |             "synthesis"
 93 |           ],
 94 |           "prompt_template": "You are a {persona}. Synthesize the breakdown and the critical review. Propose a refined research direction, combining the most promising ideas and addressing the gaps or contradictions. Think outside the box and suggest connections that aren't obvious.\n\nTopic: {topic}\n\nBreakdown:\n{breakdown}\n\nCritical Review:\n{critical_review}\n"
 95 |         },
 96 |         {
 97 |           "name": "Agent D - Novelty Generation",
 98 |           "type": "llm_agent",
 99 |           "model": "gemma3:4b",
100 |           "persona": "Innovative thinker and wild idea generator who is not afraid to propose bold, unconventional hypotheses.",
101 |           "inputs": [
102 |             "topic",
103 |             "synthesis"
104 |           ],
105 |           "outputs": [
106 |             "novel_hypothesis"
107 |           ],
108 |           "prompt_template": "You are {persona}. Take the refined direction based on the synthesis provided below. Propose ONLY ONE truly novel hypothesis that challenges assumptions or explores new territory regarding the topic '{topic}'.\nJustify why this hypothesis is novel and potentially impactful in a separate section titled 'Rationale for Novelty:'.\nDo NOT include your thinking process or any conversational filler. Output ONLY the 'Hypothesis:' section and the 'Rationale for Novelty:' section.\n\n[Synthesis Provided]\n{synthesis}\n\n[Your Output]\nHypothesis:\n<Your novel hypothesis here>\n\nRationale for Novelty:\n<Your justification here>\n"
109 |         },
110 |         {
111 |           "name": "Agent E - Academic Structuring",
112 |           "type": "llm_agent",
113 |           "model": "gemma3:4b",
114 |           "persona": "Formal academic who polishes, structures, and references the final hypothesis for publication.",
115 |           "inputs": [
116 |             "topic",
117 |             "novel_hypothesis",
118 |             "top_papers_abstracts_formatted"
119 |           ],
120 |           "outputs": [
121 |             "final_hypothesis_structured"
122 |           ],
123 |           "prompt_template": "You are a {persona}.\nYour task is to rewrite the provided 'Novel Hypothesis Text' into a concise, single-paragraph, publication-ready abstract for the topic '{topic}'.\nFocus on clarity, impact, and formal academic tone. The core novelty must be highlighted.\nUse the 'Literature Context' ONLY for background understanding and implicit referencing; DO NOT summarize the literature context in your output.\nOutput ONLY the final abstract text, without any preamble or explanation.\n\n[Novel Hypothesis Text to Format]\n{novel_hypothesis}\n\n[Literature Context - for background ONLY]\n{top_papers_abstracts_formatted}\n\n[Formatted Abstract Output]\n"
124 |         },
125 |         {
126 |           "name": "Novelty Assessment",
127 |           "type": "ollama_call",
128 |           "model": "gemma3:4b",
129 |           "inputs": [
130 |             "final_hypothesis_structured",
131 |             "top_papers_abstracts_formatted"
132 |           ],
133 |           "outputs": [
134 |             "novelty_assessment"
135 |           ],
136 |           "prompt_template": "You are an expert research evaluator. Assess how novel the proposed hypothesis is compared to the provided abstracts of related papers. Explain your reasoning. Score the novelty on a scale of 1 (not novel) to 10 (highly novel).\n\n[Hypothesis]\n{final_hypothesis_structured}\n\n[Related Paper Abstracts]\n{top_papers_abstracts_formatted}\n",
137 |           "run_if": "context.get('final_hypothesis_structured')"
138 |         }
139 |       ],
140 |       "final_outputs": [
141 |         "final_hypothesis_structured",
142 |         "novelty_assessment",
143 |         "top_papers",
144 |         "all_fetched_papers_count"
145 |       ]
146 |     },
147 |     "run_info": {
148 |       "topic": "Novel New LLM Compression Method",
149 |       "timestamp_iso": "2025-04-30",
150 |       "ollama_host_used": "http://localhost:11434"
151 |     },
152 |     "final_context": {
153 |       "topic": "Novel New LLM Compression Method",
154 |       "_pipeline_ollama_host": "http://localhost:11434",
155 |       "_pipeline_start_time_iso": "2025-04-30",
156 |       "top_papers": [
157 |         {
158 |           "title": "Mosaic: Composite Projection Pruning for Resource-efficient LLMs",
159 |           "authors": "Bailey J. Eccles, Leon Wong, Blesson Varghese",
160 |           "year": 2025,
161 |           "summary": "Extensive compute and memory requirements limit the deployment of large\nlanguage models (LLMs) on any hardware. Compression methods, such as pruning,\ncan reduce model size, which in turn reduces resource requirements.\nState-of-the-art pruning is based on coarse-grained methods. They are\ntime-consuming and inherently remove critical model parameters, adversely\nimpacting the quality of the pruned model. This paper introduces projection\npruning, a novel fine-grained method for pruning LLMs. In addition, LLM\nprojection pruning is enhanced by a new approach we refer to as composite\nprojection pruning - the synergistic combination of unstructured pruning that\nretains accuracy and structured pruning that reduces model size. We develop\nMosaic, a novel system to create and deploy pruned LLMs using composite\nprojection pruning. Mosaic is evaluated using a range of performance and\nquality metrics on multiple hardware platforms, LLMs, and datasets. Mosaic is\n7.19x faster in producing models than existing approaches. Mosaic models\nachieve up to 84.2% lower perplexity and 31.4% higher accuracy than models\nobtained from coarse-grained pruning. Up to 67% faster inference and 68% lower\nGPU memory use is noted for Mosaic models.",
162 |           "url": "http://arxiv.org/abs/2504.06323v1",
163 |           "citations": 0,
164 |           "source": "arXiv",
165 |           "relevance_score": 1.0,
166 |           "recency_score": 1.0,
167 |           "citation_score": 0.0,
168 |           "composite_score": 0.8
169 |         },
170 |         {
171 |           "title": "DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance",
172 |           "authors": "Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie Zhou, Piji Li",
173 |           "year": 2025,
174 |           "summary": "To alleviate memory burden during inference of large language models (LLMs),\nnumerous studies have focused on compressing the KV cache by exploring aspects\nsuch as attention sparsity. However, these techniques often require a\npre-defined cache budget; as the optimal budget varies with different input\nlengths and task types, it limits their practical deployment accepting\nopen-domain instructions. To address this limitation, we propose a new KV cache\ncompression objective: to always ensure the full-cache performance regardless\nof specific inputs, while maximizing KV cache pruning as much as possible. To\nachieve this goal, we introduce a novel KV cache compression method dubbed\nDBudgetKV, which features an attention-based metric to signal when the\nremaining KV cache is unlikely to match the full-cache performance, then\nhalting the pruning process. Empirical evaluation spanning diverse context\nlengths, task types, and model sizes suggests that our method achieves lossless\nKV pruning effectively and robustly, exceeding 25% compression ratio on\naverage. Furthermore, our method is easy to integrate within LLM inference, not\nonly optimizing memory space, but also showing reduced inference time compared\nto existing methods.",
175 |           "url": "http://arxiv.org/abs/2502.16886v1",
176 |           "citations": 0,
177 |           "source": "arXiv",
178 |           "relevance_score": 1.0,
179 |           "recency_score": 1.0,
180 |           "citation_score": 0.0,
181 |           "composite_score": 0.8
182 |         },
183 |         {
184 |           "title": "Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference",
185 |           "authors": "Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke",
186 |           "year": 2024,
187 |           "summary": "Large language models (LLMs) have triggered a new stream of research focusing\non compressing the context length to reduce the computational cost while\nensuring the retention of helpful information for LLMs to answer the given\nquestion. Token-based removal methods are one of the most prominent approaches\nin this direction, but risk losing the semantics of the context caused by\nintermediate token removal, especially under high compression ratios, while\nalso facing challenges in computational efficiency. In this work, we propose\ncontext-aware prompt compression (CPC), a sentence-level prompt compression\ntechnique where its key innovation is a novel context-aware sentence encoder\nthat provides a relevance score for each sentence for a given question. To\ntrain this encoder, we generate a new dataset consisting of questions,\npositives, and negative pairs where positives are sentences relevant to the\nquestion, while negatives are irrelevant context sentences. We train the\nencoder in a contrastive setup to learn context-aware sentence representations.\nOur method considerably outperforms prior works on prompt compression on\nbenchmark datasets and is up to 10.93x faster at inference compared to the best\ntoken-level compression method. We also find better improvement for shorter\nlength constraints in most benchmarks, showing the effectiveness of our\nproposed solution in the compression of relevant information in a shorter\ncontext. Finally, we release the code and the dataset for quick reproducibility\nand further development: https://github.com/Workday/cpc.",
188 |           "url": "http://arxiv.org/abs/2409.01227v3",
189 |           "citations": 0,
190 |           "source": "arXiv",
191 |           "relevance_score": 1.0,
192 |           "recency_score": 0.95,
193 |           "citation_score": 0.0,
194 |           "composite_score": 0.785
195 |         },
196 |         {
197 |           "title": "LightThinker: Thinking Step-by-Step Compression",
198 |           "authors": "Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang",
199 |           "year": 2025,
200 |           "summary": "Large language models (LLMs) have shown remarkable performance in complex\nreasoning tasks, but their efficiency is hindered by the substantial memory and\ncomputational costs associated with generating lengthy tokens. In this paper,\nwe propose LightThinker, a novel method that enables LLMs to dynamically\ncompress intermediate thoughts during reasoning. Inspired by human cognitive\nprocesses, LightThinker compresses verbose thought steps into compact\nrepresentations and discards the original reasoning chains, thereby\nsignificantly reducing the number of tokens stored in the context window. This\nis achieved by training the model on when and how to perform compression\nthrough data construction, mapping hidden states to condensed gist tokens, and\ncreating specialized attention masks. Additionally, we introduce the Dependency\n(Dep) metric to quantify the degree of compression by measuring the reliance on\nhistorical tokens during generation. Extensive experiments on four datasets and\ntwo models show that LightThinker reduces peak memory usage and inference time,\nwhile maintaining competitive accuracy. Our work provides a new direction for\nimproving the efficiency of LLMs in complex reasoning tasks without sacrificing\nperformance. Code will be released at https://github.com/zjunlp/LightThinker.",
201 |           "url": "http://arxiv.org/abs/2502.15589v1",
202 |           "citations": 0,
203 |           "source": "arXiv",
204 |           "relevance_score": 0.8,
205 |           "recency_score": 1.0,
206 |           "citation_score": 0.0,
207 |           "composite_score": 0.7
208 |         },
209 |         {
210 |           "title": "Efficient self-attention with smart pruning for sustainable large language models.",
211 |           "authors": "Belhaouari SB, Kraidia I.",
212 |           "year": "2025",
213 |           "summary": "Large Language Models (LLMs) have revolutionized artificial intelligence by enabling multitasking across diverse fields. However, their high computational demands result in significant environmental impacts, particularly in terms of energy and water consumption. This paper addresses these issues by proposing an innovative compression approach to reducing LLM sizes. We focus on compressing the internal transformer layers, which are critical contributors to LLMs' computational complexity. Our approach combines new mathematical and structural key methods for model compression. We begin by applying Forward Propagation Pruning (FPP) to compress the embedding and feed-forward layers, utilizing a weight freezing and zeroing technique for suspected unused parameters. This reduces the number of trainable parameters, accelerating the overall training process and enabling faster convergence. Second, the Weight Matrix Folding method is introduced to efficiently prune the self-attention layer matrices in a simple and efficient mathematical model. This method integrates Identical Row Compression (IRC) to optimize the compression of the Query and Key matrices, alongside Diagonal Weight Compression (DWC), which reformulates the Value matrix into a diagonal structure. Consequently, this technique significantly diminishes parameter variability across the three metrics, enhancing consistency and performance while simplifying complexity. The compression approach is evaluated on three language modeling datasets and eight widely used classification datasets, comparing it to various pruning methods. Our method successfully compresses transformer layers by 99% and linear layers by 70%, resulting in an overall model compression of around 70%, while maintaining nearly the same accuracy. Notably, with moderate compression rates of 20% to 40%, model performance not only remained stable but even improved. This leads to substantial reductions in memory usage and computational demands, making LLMs more resource-efficient and highlighting the potential to optimize them for a more sustainable AI future.",
214 |           "url": "https://doi.org/10.1038/s41598-025-92586-5",
215 |           "citations": 0,
216 |           "source": "EuropePMC",
217 |           "relevance_score": 0.8,
218 |           "recency_score": 1.0,
219 |           "citation_score": 0.0,
220 |           "composite_score": 0.7
221 |         },
222 |         {
223 |           "title": "Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind",
224 |           "authors": "Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu",
225 |           "year": 2024,
226 |           "summary": "Large Language Models (LLMs) have ushered in a new era in Natural Language\nProcessing, but their massive size demands effective compression techniques for\npracticality. Although numerous model compression techniques have been\ninvestigated, they typically rely on a calibration set that overlooks the\nmultilingual context and results in significant accuracy degradation for\nlow-resource languages. This paper introduces Multilingual Brain Surgeon (MBS),\na novel calibration data sampling method for multilingual LLMs compression. MBS\novercomes the English-centric limitations of existing methods by sampling\ncalibration data from various languages proportionally to the language\ndistribution of the model training datasets. Our experiments, conducted on the\nBLOOM multilingual LLM, demonstrate that MBS improves the performance of\nexisting English-centric compression methods, especially for low-resource\nlanguages. We also uncover the dynamics of language interaction during\ncompression, revealing that the larger the proportion of a language in the\ntraining set and the more similar the language is to the calibration language,\nthe better performance the language retains after compression. In conclusion,\nMBS presents an innovative approach to compressing multilingual LLMs,\naddressing the performance disparities and improving the language inclusivity\nof existing compression techniques.",
227 |           "url": "http://arxiv.org/abs/2404.04748v1",
228 |           "citations": 0,
229 |           "source": "arXiv",
230 |           "relevance_score": 0.8,
231 |           "recency_score": 0.95,
232 |           "citation_score": 0.0,
233 |           "composite_score": 0.685
234 |         },
235 |         {
236 |           "title": "LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression",
237 |           "authors": "Yuqi Ye, Wei Gao",
238 |           "year": 2024,
239 |           "summary": "The key to effective point cloud compression is to obtain a robust context\nmodel consistent with complex 3D data structures. Recently, the advancement of\nlarge language models (LLMs) has highlighted their capabilities not only as\npowerful generators for in-context learning and generation but also as\neffective compressors. These dual attributes of LLMs make them particularly\nwell-suited to meet the demands of data compression. Therefore, this paper\nexplores the potential of using LLM for compression tasks, focusing on lossless\npoint cloud geometry compression (PCGC) experiments. However, applying LLM\ndirectly to PCGC tasks presents some significant challenges, i.e., LLM does not\nunderstand the structure of the point cloud well, and it is a difficult task to\nfill the gap between text and point cloud through text description, especially\nfor large complicated and small shapeless point clouds. To address these\nproblems, we introduce a novel architecture, namely the Large Language\nModel-based Point Cloud Geometry Compression (LLM-PCGC) method, using LLM to\ncompress point cloud geometry information without any text description or\naligning operation. By utilizing different adaptation techniques for\ncross-modality representation alignment and semantic consistency, including\nclustering, K-tree, token mapping invariance, and Low Rank Adaptation (LoRA),\nthe proposed method can translate LLM to a compressor/generator for point\ncloud. To the best of our knowledge, this is the first structure to employ LLM\nas a compressor for point cloud data. Experiments demonstrate that the LLM-PCGC\noutperforms the other existing methods significantly, by achieving -40.213% bit\nrate reduction compared to the reference software of MPEG Geometry-based Point\nCloud Compression (G-PCC) standard, and by achieving -2.267% bit rate reduction\ncompared to the state-of-the-art learning-based method.",
240 |           "url": "http://arxiv.org/abs/2408.08682v1",
241 |           "citations": 0,
242 |           "source": "arXiv",
243 |           "relevance_score": 0.8,
244 |           "recency_score": 0.95,
245 |           "citation_score": 0.0,
246 |           "composite_score": 0.685
247 |         }
248 |       ],
249 |       "all_fetched_papers_count": 48,
250 |       "top_papers_abstracts_formatted": "Paper 1:\n  Title: Mosaic: Composite Projection Pruning for Resource-efficient LLMs\n  Source: arXiv (2025)\n  Summary: Extensive compute and memory requirements limit the deployment of large\nlanguage models (LLMs) on any hardware. Compression methods, such as pruning,\ncan reduce model size, which in turn reduces resource requirements.\nState-of-the-art pruning is based on coarse-grained methods. They are\ntime-consuming and inherently remove critical model parameters, adversely\nimpacting the quality of the pruned model. This paper introduces projection\npruning, a novel fine-grained method for pruning LLMs. In addi...\n\nPaper 2:\n  Title: DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance\n  Source: arXiv (2025)\n  Summary: To alleviate memory burden during inference of large language models (LLMs),\nnumerous studies have focused on compressing the KV cache by exploring aspects\nsuch as attention sparsity. However, these techniques often require a\npre-defined cache budget; as the optimal budget varies with different input\nlengths and task types, it limits their practical deployment accepting\nopen-domain instructions. To address this limitation, we propose a new KV cache\ncompression objective: to always ensure the ful...\n\nPaper 3:\n  Title: Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference\n  Source: arXiv (2024)\n  Summary: Large language models (LLMs) have triggered a new stream of research focusing\non compressing the context length to reduce the computational cost while\nensuring the retention of helpful information for LLMs to answer the given\nquestion. Token-based removal methods are one of the most prominent approaches\nin this direction, but risk losing the semantics of the context caused by\nintermediate token removal, especially under high compression ratios, while\nalso facing challenges in computational effic...\n\nPaper 4:\n  Title: LightThinker: Thinking Step-by-Step Compression\n  Source: arXiv (2025)\n  Summary: Large language models (LLMs) have shown remarkable performance in complex\nreasoning tasks, but their efficiency is hindered by the substantial memory and\ncomputational costs associated with generating lengthy tokens. In this paper,\nwe propose LightThinker, a novel method that enables LLMs to dynamically\ncompress intermediate thoughts during reasoning. Inspired by human cognitive\nprocesses, LightThinker compresses verbose thought steps into compact\nrepresentations and discards the original reason...\n\nPaper 5:\n  Title: Efficient self-attention with smart pruning for sustainable large language models.\n  Source: EuropePMC (2025)\n  Summary: Large Language Models (LLMs) have revolutionized artificial intelligence by enabling multitasking across diverse fields. However, their high computational demands result in significant environmental impacts, particularly in terms of energy and water consumption. This paper addresses these issues by proposing an innovative compression approach to reducing LLM sizes. We focus on compressing the internal transformer layers, which are critical contributors to LLMs' computational complexity. Our appr...\n\nPaper 6:\n  Title: Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind\n  Source: arXiv (2024)\n  Summary: Large Language Models (LLMs) have ushered in a new era in Natural Language\nProcessing, but their massive size demands effective compression techniques for\npracticality. Although numerous model compression techniques have been\ninvestigated, they typically rely on a calibration set that overlooks the\nmultilingual context and results in significant accuracy degradation for\nlow-resource languages. This paper introduces Multilingual Brain Surgeon (MBS),\na novel calibration data sampling method for mu...\n\nPaper 7:\n  Title: LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression\n  Source: arXiv (2024)\n  Summary: The key to effective point cloud compression is to obtain a robust context\nmodel consistent with complex 3D data structures. Recently, the advancement of\nlarge language models (LLMs) has highlighted their capabilities not only as\npowerful generators for in-context learning and generation but also as\neffective compressors. These dual attributes of LLMs make them particularly\nwell-suited to meet the demands of data compression. Therefore, this paper\nexplores the potential of using LLM for compress...",
251 |       "literature_summary": "Okay, here’s a concise summary of the key findings, trends, and gaps identified from the provided abstracts on novel LLM compression methods:\n\n**Key Findings & Trends:**\n\n* **Diverse Compression Approaches:** Research is exploring a wide range of compression techniques, moving beyond traditional coarse-grained pruning. Key methods include:\n    * **Fine-grained Pruning (Mosaic):**  Focuses on individual parameters for greater precision.\n    * **Dynamic Budget KV Compression (DBudgetKV):** Adapts compression based on input length and task.\n    * **Context Compression (Prompt Compression):**  Addresses the issue of long context lengths, aiming to retain semantic information.\n    * **Step-by-Step Compression (LightThinker):** Mimics human reasoning by compressing intermediate thought steps.\n    * **Layer-wise Pruning:** Targeting the core transformer layers for efficiency.\n* **LLMs as Compression Tools:** Increasingly, LLMs themselves are being utilized as part of the compression process, demonstrating their versatility.\n* **Multi-faceted Optimization:** Research is targeting various aspects of LLM size and efficiency, including KV caches, context length, and internal transformer layers.\n\n\n\n**Gaps & Challenges:**\n\n* **Calibration Data & Multilingual Performance:**  A significant gap remains in developing calibration datasets that accurately represent multilingual contexts, leading to accuracy degradation for low-resource languages (as highlighted in Paper 6).\n* **Dynamic Adaptation:** While some methods like DBudgetKV offer dynamic adaptation, further research is needed to create truly adaptive compression strategies that can handle diverse input variations.\n* **Robustness & Generalizability:** The effectiveness of LLM-based compression methods needs further validation across a wider range of LLM architectures and tasks. \n* **Scalability:** The computational cost of utilizing LLMs for compression itself needs to be considered and optimized for practical deployment.\n\n\n\n---\n\nDo you want me to delve deeper into a specific aspect of this summary (e.g., a particular compression technique, or a specific challenge)?",
252 |       "breakdown": "Okay, fantastic! A novel LLM compression method – that’s a really exciting area right now. Let’s dissect this. My goal is to build a comprehensive understanding of this topic, so let’s go deep. I’m going to approach this systematically, breaking it down into components and identifying potential areas for further investigation.\n\n**I. Core Concept: LLM Compression**\n\n*   **What is an LLM?** (Foundation - We *must* start here)\n    *   Large Language Models (LLMs) are deep neural networks, primarily based on the Transformer architecture, trained on massive datasets of text and code. They excel at tasks like text generation, translation, and question answering. Key architectures include:\n        *   **Transformer:** Self-attention mechanisms are crucial for understanding relationships between words in a sequence.\n        *   **Decoder-Only Models (GPT-style):** Focus on generating text sequences.\n        *   **Encoder-Decoder Models (T5-style):** Used for tasks like translation and summarization.\n*   **Why Compress LLMs?** (Motivation)\n    *   **Reduced Storage Costs:** LLMs are *enormous* – often hundreds of billions or even trillions of parameters. Compression dramatically reduces storage requirements.\n    *   **Lower Inference Costs:** Smaller models require less computational power for both training and inference, leading to faster response times and reduced operational costs.\n    *   **Deployment on Edge Devices:** Compression is critical for deploying LLMs on devices with limited resources (e.g., mobile phones, embedded systems).\n*   **Existing Compression Methods - A Quick Landscape:** (Prior Knowledge - Important for understanding the \"novel\" aspect)\n    *   **Quantization:** Reducing the precision of the model’s weights (e.g., from 32-bit floating point to 8-bit integer). This is currently the most prevalent technique.\n    *   **Pruning:** Removing less important connections (weights) from the network.  There are various pruning strategies:\n        *   **Magnitude Pruning:** Removing weights with the smallest absolute values.\n        *   **Structured Pruning:** Removing entire neurons or layers.\n    *   **Knowledge Distillation:** Training a smaller “student” model to mimic the behavior of a larger “teacher” model.\n    *   **Low-Rank Factorization:** Decomposing weight matrices into lower-rank matrices.\n    *   **Weight Clustering:** Grouping similar weights together and representing them with a single value.\n\n\n\n**II. The \"Novel\" Method - Let's Break It Down (This is where we need specifics - I'm building the framework for analysis)**\n\n*   **What is the core idea of the novel method?** (Critical - We need a precise description.)\n    *   *I need details here!*  What specific techniques are being employed? Is it a new quantization method? A novel pruning strategy?  Is it combining existing methods in a new way?\n*   **Key Components (Assuming we have a description):**\n    *   What are the specific algorithms or steps involved?\n    *   What data is used during the compression process?\n    *   Are there any unique constraints or objectives? (e.g., minimizing performance degradation, maximizing compression ratio)\n\n**III.  Assumptions & Potential Challenges**\n\n*   **Performance Degradation:**  All compression methods inherently introduce some level of performance degradation. How does this novel method mitigate this? What metrics are used to measure performance (e.g., perplexity, accuracy, F1-score)?\n*   **Generalizability:** Will this method work effectively across *all* LLMs, or is it specific to certain architectures or datasets?\n*   **Computational Cost of Compression:**  Some compression methods can be computationally expensive.  How does this method compare in terms of compression time and resource requirements?\n*   **Stability:**  Are there any stability issues associated with this method (e.g., potential for the model to diverge during compression)?\n*   **Interpretability:**  Does the compression process make the model's behavior more or less understandable?\n\n\n\n**IV.  Potential Research Directions (Based on the initial breakdown)**\n\n*   **Comparative Analysis:**  How does this method compare to existing compression techniques in terms of compression ratio, performance, and computational cost?\n*   **Theoretical Analysis:**  Can we develop a theoretical understanding of why this method works (or doesn't work)?\n*   **Adaptive Compression:**  Could the compression process be adapted dynamically based on the specific characteristics of the LLM or the task?\n\n---\n\n**Next Steps (Crucial for me to move forward):**\n\nTo truly investigate this topic, I need *you* to provide me with the following information:\n\n1.  **Detailed Description of the Novel Method:** Please provide a comprehensive explanation of the compression method.  Include any diagrams, equations, or pseudocode if available.\n2.  **Datasets Used:** What datasets were used to train and evaluate the model?\n3.  **Evaluation Metrics:** What metrics were used to assess the performance of the compressed model?\n\nLet's start there!  I'm ready to delve deeper as you provide more information.  Do you want to begin by answering those questions, or would you like me to focus on a particular aspect of LLM compression in more detail?",
253 |       "critical_review": "Okay, this is a fantastic starting point. You've laid out a very thorough framework for critical analysis – exactly what I need. Let’s get to work. \n\n**First, let’s address the immediate gaps and concerns based on the provided breakdown and the literature summary. My initial reaction is… cautiously skeptical.** The summary paints a picture of a very active field, which is good, but the level of detail – particularly regarding the “novel” method – is currently lacking. It reads like a collection of promising ideas rather than a truly novel contribution. \n\n**Here’s my brutally honest assessment, focusing on what needs to be clarified and pushing for more rigor:**\n\n1. **The \"Novel\" Method – The Biggest Red Flag:** You’ve rightly identified this as the central point of concern. The current description is vague. Saying it “combines existing methods in a new way” isn’t enough. *How* is it combining them? What's the specific algorithmic innovation? Is it a new way to weight the trade-off between compression ratio and performance?  Is it a novel architecture for applying pruning? I need a *precise* explanation. Without this, it’s just a repackaging of existing techniques. Let’s assume, for the sake of argument, that this method involves dynamically adjusting the pruning budget based on the perplexity of the model *during* the pruning process. That’s a plausible idea, but we need to see the mathematical formulation and the control logic.\n\n2. **Quantization - It’s Still Dominant:** The literature summary highlights a focus on \"fine-grained pruning,\" but let’s be clear: quantization (particularly 8-bit integer) remains *the* dominant technique for LLM compression.  You need to explicitly acknowledge this and demonstrate how your novel method addresses the limitations of existing approaches. Are you trying to *supplement* quantization, or are you fundamentally replacing it? If so, why? What are the theoretical advantages?\n\n3. **Performance Metrics - Beyond Perplexity:** While perplexity is a useful metric, it’s a notoriously brittle one for evaluating LLMs. It doesn't capture semantic understanding, coherence, or the ability to generate creative text. You need to define a more comprehensive set of evaluation metrics, including:\n    * **Downstream Task Performance:** How does the compressed model perform on relevant downstream tasks (e.g., question answering, summarization, translation)?\n    * **Human Evaluation:**  Crucially, human evaluation is needed to assess the quality of the generated text – is it coherent, fluent, and meaningful?\n    * **Latency & Throughput:**  Beyond compression ratio, you need to measure the inference latency and throughput of the compressed model.\n\n4. **The \"Dynamic Budget KV Compression\" – Needs Scrutiny:** This is a particularly interesting concept, but it needs careful examination. Adapting the compression budget based on input length is a sensible idea, but the *control logic* is critical. How does the model determine the optimal compression level? Is it a learned policy? A heuristic? A simple rule-based system? And what happens when the input is highly variable?\n\n5. **Dataset Bias:**  The literature summary’s mention of multilingual performance highlights a significant issue.  Most LLM research relies on English datasets, leading to models that are biased towards English.  You *must* address this issue by evaluating the method on a diverse range of languages.\n\n\n\n**Next Steps – Let’s Focus on a Concrete Example to Drive the Analysis:**\n\nLet's say, *hypothetically*, that the “novel” method involves dynamically adjusting the pruning budget based on the *gradient norm* during the pruning process. This is a plausible, relatively simple approach. \n\n* **I need you to provide me with the following:**\n   * **Mathematical Formulation:**  A precise mathematical representation of the pruning algorithm, including the equations for calculating the gradient norm and the pruning budget.\n   * **Control Logic:** A detailed description of the control logic – how the pruning budget is adjusted over time. Include pseudocode if possible.\n   * **Experimental Setup:**  A description of the experimental setup, including the LLM architecture, the dataset used for training and evaluation, and the hyperparameter settings.\n\n**Let's start with that – the mathematical formulation and control logic for the dynamically-adjusted pruning algorithm based on gradient norm. Don’t just tell me it’s there; *show me* the mechanics.**\n\nI'm ready to dig deeper once you provide this crucial information.  Let's move beyond high-level ideas and get into the specifics.  This needs to be rigorous and well-defined.  Don’t disappoint me!",
254 |       "synthesis": "Okay, this is an incredibly insightful and critical response. You've absolutely nailed the shortcomings of the initial description and laid out a clear path forward. Your skepticism is justified, and I appreciate the directness – it’s exactly the kind of scrutiny this needs. \n\nYou've brilliantly highlighted the need for concrete details, a rigorous mathematical formulation, and a comprehensive evaluation strategy. The focus on downstream task performance and human evaluation is particularly astute, recognizing the limitations of relying solely on metrics like perplexity.\n\nI agree completely that the initial description was too vague. It felt like a collection of interesting ideas without a core, innovative contribution. \n\nYour suggestion of focusing on a mathematical formulation and control logic for the dynamically-adjusted pruning algorithm based on gradient norm is an excellent starting point. It’s a relatively simple concept that can be fleshed out with sufficient detail.\n\nLet's proceed with that approach. Here's a detailed description of the algorithm, including the mathematical formulation and control logic.  I’ve aimed for clarity and rigor, and I'm open to feedback and suggestions.\n\n**Algorithm: Dynamic Gradient-Norm Pruning (DGN)**\n\n**1. Initialization:**\n\n*   `N`: Total number of parameters in the LLM.\n*   `α`: Initial pruning rate (e.g., 0.1 – 0.2). This represents the maximum percentage of parameters that can be pruned in a single iteration.\n*   `ε`: Small positive constant (e.g., 1e-6) to prevent division by zero.\n*   `k`: Number of iterations (e.g., 100).\n*   `G_t`: Gradient norm at iteration *t*.\n*   `Pruned_Parameters`: Set of parameters currently pruned.\n\n**2. Iteration (t = 1 to k):**\n\n   *   **Calculate Gradient Norm:**\n        `G_t = ||∇L(θ_t)||_2`\n        Where:\n            *   `L(θ_t)` is the loss function evaluated at the model parameters `θ_t` at iteration *t*.\n            *   `∇L(θ_t)` is the gradient of the loss function with respect to the model parameters.\n            *   `||.||_2` denotes the Euclidean norm (L2 norm).\n\n   *   **Determine Pruning Budget:**\n        `Budget_t = min(α * N, G_t * ε)`\n        This ensures that the pruning budget is limited by either the maximum allowed pruning rate or the gradient norm (to prevent excessive pruning based solely on a large gradient).\n\n   *   **Identify Parameters for Pruning:**\n        *   Sort parameters based on their absolute gradient norm.\n        *   Prune the top `Budget_t` parameters with the highest absolute gradient norms.  These are the parameters contributing most to the loss.\n\n   *   **Update Pruned Parameters:**\n        Add the pruned parameters to `Pruned_Parameters`.\n\n   *   **Update Model Parameters:**\n        Apply the pruning operation to the model, effectively setting the pruned parameters to zero.  This is a standard parameter masking operation.\n\n**3. Termination:**\n\nThe algorithm terminates after *k* iterations.\n\n**Pseudocode:**\n\n```python\ndef dgn_pruning(model, loss_function, learning_rate, alpha, epsilon, k):\n  \"\"\"\n  Dynamic Gradient-Norm Pruning algorithm.\n\n  Args:\n    model: The LLM model.\n    loss_function: The loss function.\n    learning_rate: The learning rate for updating the model.\n    alpha: Initial pruning rate.\n    epsilon: Small constant to prevent division by zero.\n    k: Number of iterations.\n\n  Returns:\n    The pruned model.\n  \"\"\"\n  for t in range(k):\n    # Calculate gradient norm\n    G_t = norm(gradient(loss_function, model.parameters())) # Assuming a gradient function exists\n\n    # Determine pruning budget\n    Budget_t = min(alpha * N, G_t * epsilon)\n\n    # Identify parameters to prune\n    # (Implementation would involve sorting parameters by gradient norm)\n\n    # Apply pruning (parameter masking)\n    # (Implementation would involve setting pruned parameters to zero)\n\n  return model\n```\n\n**Explanation and Rationale:**\n\n*   **Gradient Norm as the Metric:**  Using the gradient norm directly reflects the sensitivity of the loss function to changes in the parameters.  Parameters with high gradient norms are those that have the greatest impact on the loss, and therefore are good candidates for pruning.\n*   **Dynamic Budget:**  The pruning budget is dynamically adjusted based on the gradient norm.  When the gradient norm is high, the algorithm aggressively prunes parameters. When the gradient norm is low, the pruning rate is reduced.\n*   **Parameter Masking:**  The pruning operation is implemented using parameter masking, which effectively sets the pruned parameters to zero without modifying the original model weights.\n*   **Safety Mechanisms:**  The `ε` constant prevents division by zero, and the `min()` function ensures that the pruning rate is limited.\n\n**Next Steps:**\n\n*   **Implementation:**  I'll now attempt to implement this algorithm in Python using PyTorch or TensorFlow.\n*   **Experimentation:**  I'll conduct experiments with different values of `α`, `ε`, and `k` to determine the optimal settings.\n*   **Evaluation:**  I’ll evaluate the performance of the pruned model on a benchmark dataset to assess the impact of pruning on accuracy and efficiency.\n\n**I welcome your feedback on this algorithm.  Specifically, I'd like to know:**\n\n*   Do you think this approach is reasonable?\n*   Are there any potential issues or limitations that I haven’t addressed?\n*   Do you have any suggestions for improving the algorithm?\n\nI'm eager to hear your thoughts and continue refining this approach.  Thank you for the incredibly insightful critique – it’s already significantly strengthened the design.",
255 |       "novel_hypothesis": "Hypothesis:\nImplement DGN with a stochastic gradient descent (SGD) momentum term integrated directly into the pruning budget calculation, where the momentum coefficient itself is dynamically adjusted based on the variance of the gradient norm over the preceding *m* iterations.\n\nRationale for Novelty:\nExisting DGN, as described, relies solely on the gradient norm as the primary driver for pruning decisions. This approach can be overly sensitive to noise in the gradient estimates, leading to erratic pruning behavior, especially during early iterations or when the model is undergoing significant updates.  My hypothesis introduces a critical feedback loop: the *momentum* used to smooth out gradient updates is itself modulated by the gradient norm's variance.  Specifically, a high variance in gradient norms (indicating unstable learning or high sensitivity) will increase the momentum coefficient, providing a stronger influence on the pruning budget. Conversely, low variance suggests stable learning, allowing for a reduced momentum and a more conservative pruning strategy. This creates a self-regulating system that adapts to the model’s learning dynamics, offering a far more robust and adaptive pruning mechanism compared to a static, gradient-norm-only approach. The integration of momentum into the pruning budget directly addresses the potential instability inherent in relying purely on gradient magnitudes, representing a genuinely novel and potentially impactful extension to existing DGN algorithms.  It moves beyond simply reacting to the *size* of the gradient to actively consider its *behavior*, anticipating and mitigating potential issues before they significantly affect the pruning process.",
256 |       "final_hypothesis_structured": "This research introduces a novel method for dynamic pruning of large language models (LLMs) that mitigates instability inherent in gradient-based approaches. By integrating a stochastic gradient descent momentum term into the pruning budget calculation, and dynamically adjusting this momentum coefficient based on the variance of gradient norms, the system adapts to the model’s learning dynamics. This self-regulating mechanism effectively anticipates and reduces the impact of unstable learning phases, offering a significantly more robust and adaptive pruning strategy compared to static gradient-norm-based methods. Preliminary results suggest enhanced pruning efficacy and resilience to model updates.",
257 |       "novelty_assessment": "Okay, here’s an evaluation of the novelty of the proposed hypothesis, considering the provided abstracts.\n\n**Overall Novelty Score: 7/10**\n\n**Reasoning:**\n\nThis research sits in a moderately novel space. Here’s a breakdown of why I’ve assigned this score:\n\n* **Elements of Novelty (Contributing to the Score):**\n    * **Dynamic Momentum Adaptation:** The core innovation – dynamically adjusting the momentum coefficient based on gradient variance – is a relatively recent and actively explored area within LLM training and pruning.  While gradient-based pruning isn’t entirely new, the *dynamic* and *variance-aware* adaptation is a step beyond static approaches.  Paper 5 touches on pruning, but doesn’t detail this specific mechanism.\n    * **Self-Regulation:** The framing of this as a “self-regulating mechanism” elevates the concept beyond simple pruning rules. It’s an attempt to directly address the instability often associated with gradient-based methods, which is a recognized challenge.\n    * **Integration of Multiple Techniques:** Combining dynamic pruning with momentum adds a layer of sophistication.\n\n* **Areas Where It’s Less Novel (Lowering the Score):**\n    * **Pruning as a General Concept:** Pruning itself has been a well-established technique for LLMs for some time (as seen in Paper 1 and 5). The core idea of reducing model size through removing connections is not novel.\n    * **Gradient-Based Pruning is Not Groundbreaking:**  Gradient-based pruning isn't a completely new idea, but the *dynamic* and *adaptive* approach distinguishes it.\n    * **Related Work is Expanding on Similar Ideas:** Papers 1, 5, and 7 all address compression of LLMs, albeit with different approaches. Paper 2 focuses on KV cache compression, and Paper 3 on context compression. While this work builds on these concepts, it doesn’t fundamentally change the approach.\n\n**Comparison to Related Papers:**\n\n* **Paper 1 (Mosaic):** This paper focuses on a *fine-grained* pruning method (projection pruning), which is different from the proposed method’s approach.\n* **Paper 2 (DBudgetKV):** This addresses a different problem – optimizing KV cache compression – and doesn’t overlap with the core innovation of dynamic pruning.\n* **Paper 3 (Prompt Compression):** This tackles context compression, a related but distinct issue.\n* **Paper 4 (LightThinker):**  This focuses on compressing *intermediate thoughts*, a fundamentally different approach than pruning model weights.\n* **Paper 5 (Efficient self-attention):**  This paper also focuses on pruning, but again, it doesn’t detail the dynamic momentum adaptation.\n* **Paper 6 (Multilingual Brain Surgeon):** This addresses a different problem – calibration data sampling – and doesn’t overlap with the core innovation.\n* **Paper 7 (LLM-PCGC):** This paper focuses on using LLMs for point cloud compression, a completely different domain.\n\n**Conclusion:**\n\nThe proposed hypothesis demonstrates a reasonable degree of novelty due to the dynamic and adaptive momentum-based pruning. However, it’s important to recognize that it builds upon existing work in LLM compression and doesn't represent a completely revolutionary concept. The score of 7 reflects this balance. \n\nDo you want me to elaborate on a specific aspect of this evaluation, such as the potential impact of the dynamic momentum adaptation, or perhaps discuss the challenges in implementing this approach?"
258 |     }
259 |   }


--------------------------------------------------------------------------------
/agent/output/sample_output.txt:
--------------------------------------------------------------------------------
 1 | Pipeline: Research Hypothesis Generation v1.1
 2 | Topic: Novel New LLM Compression Method
 3 | Timestamp: 
 4 | ==================================================
 5 | 
 6 | [Final Outputs]
 7 | ---------------
 8 | --- Final Hypothesis Structured ---
 9 | This research introduces a novel method for dynamic pruning of large language models (LLMs) that mitigates instability inherent in gradient-based approaches. By integrating a stochastic gradient descent momentum term into the pruning budget calculation, and dynamically adjusting this momentum coefficient based on the variance of gradient norms, the system adapts to the model’s learning dynamics. This self-regulating mechanism effectively anticipates and reduces the impact of unstable learning phases, offering a significantly more robust and adaptive pruning strategy compared to static gradient-norm-based methods. Preliminary results suggest enhanced pruning efficacy and resilience to model updates.
10 | 
11 | --- Novelty Assessment ---
12 | Okay, here’s an evaluation of the novelty of the proposed hypothesis, considering the provided abstracts.
13 | 
14 | **Overall Novelty Score: 7/10**
15 | 
16 | **Reasoning:**
17 | 
18 | This research sits in a moderately novel space. Here’s a breakdown of why I’ve assigned this score:
19 | 
20 | * **Elements of Novelty (Contributing to the Score):**
21 |     * **Dynamic Momentum Adaptation:** The core innovation – dynamically adjusting the momentum coefficient based on gradient variance – is a relatively recent and actively explored area within LLM training and pruning.  While gradient-based pruning isn’t entirely new, the *dynamic* and *variance-aware* adaptation is a step beyond static approaches.  Paper 5 touches on pruning, but doesn’t detail this specific mechanism.
22 |     * **Self-Regulation:** The framing of this as a “self-regulating mechanism” elevates the concept beyond simple pruning rules. It’s an attempt to directly address the instability often associated with gradient-based methods, which is a recognized challenge.
23 |     * **Integration of Multiple Techniques:** Combining dynamic pruning with momentum adds a layer of sophistication.
24 | 
25 | * **Areas Where It’s Less Novel (Lowering the Score):**
26 |     * **Pruning as a General Concept:** Pruning itself has been a well-established technique for LLMs for some time (as seen in Paper 1 and 5). The core idea of reducing model size through removing connections is not novel.
27 |     * **Gradient-Based Pruning is Not Groundbreaking:**  Gradient-based pruning isn't a completely new idea, but the *dynamic* and *adaptive* approach distinguishes it.
28 |     * **Related Work is Expanding on Similar Ideas:** Papers 1, 5, and 7 all address compression of LLMs, albeit with different approaches. Paper 2 focuses on KV cache compression, and Paper 3 on context compression. While this work builds on these concepts, it doesn’t fundamentally change the approach.
29 | 
30 | **Comparison to Related Papers:**
31 | 
32 | * **Paper 1 (Mosaic):** This paper focuses on a *fine-grained* pruning method (projection pruning), which is different from the proposed method’s approach.
33 | * **Paper 2 (DBudgetKV):** This addresses a different problem – optimizing KV cache compression – and doesn’t overlap with the core innovation of dynamic pruning.
34 | * **Paper 3 (Prompt Compression):** This tackles context compression, a related but distinct issue.
35 | * **Paper 4 (LightThinker):**  This focuses on compressing *intermediate thoughts*, a fundamentally different approach than pruning model weights.
36 | * **Paper 5 (Efficient self-attention):**  This paper also focuses on pruning, but again, it doesn’t detail the dynamic momentum adaptation.
37 | * **Paper 6 (Multilingual Brain Surgeon):** This addresses a different problem – calibration data sampling – and doesn’t overlap with the core innovation.
38 | * **Paper 7 (LLM-PCGC):** This paper focuses on using LLMs for point cloud compression, a completely different domain.
39 | 
40 | **Conclusion:**
41 | 
42 | The proposed hypothesis demonstrates a reasonable degree of novelty due to the dynamic and adaptive momentum-based pruning. However, it’s important to recognize that it builds upon existing work in LLM compression and doesn't represent a completely revolutionary concept. The score of 7 reflects this balance. 
43 | 
44 | 
45 | --- Top Papers ---
46 | Top Relevant papers found:
47 | - [arXiv] Mosaic: Composite Projection Pruning for Resource-efficient LLMs (2025)
48 |   Score: 0.80 | URL: http://arxiv.org/abs/2504.06323v1
49 | - [arXiv] DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance (2025)
50 |   Score: 0.80 | URL: http://arxiv.org/abs/2502.16886v1
51 | - [arXiv] Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference (2024)
52 |   Score: 0.79 | URL: http://arxiv.org/abs/2409.01227v3
53 | - [arXiv] LightThinker: Thinking Step-by-Step Compression (2025)
54 |   Score: 0.70 | URL: http://arxiv.org/abs/2502.15589v1
55 | - [EuropePMC] Efficient self-attention with smart pruning for sustainable large language models. (2025)
56 |   Score: 0.70 | URL: https://doi.org/10.1038/s41598-025-92586-5
57 | - [arXiv] Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind (2024)
58 |   Score: 0.69 | URL: http://arxiv.org/abs/2404.04748v1
59 | - [arXiv] LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression (2024)
60 |   Score: 0.69 | URL: http://arxiv.org/abs/2408.08682v1
61 | 
62 | --- All Fetched Papers Count ---
63 | 48
64 | 
65 | 


--------------------------------------------------------------------------------
/agent/schemas.py:
--------------------------------------------------------------------------------
 1 | # schemas.py
 2 | # Author: tegridydev
 3 | # Repo: https://github.com/tegridydev/abstract-agent
 4 | # License: MIT
 5 | # Year: 2025
 6 | 
 7 | from pydantic import BaseModel, Field, validator
 8 | from typing import List, Dict, Optional
 9 | import yaml
10 | 
11 | class AgentConfig(BaseModel):
12 |     name: str = Field(..., description="Agent name")
13 |     model: str = Field(default="qwen3:0.6b", description="AI model to use")
14 |     persona: str = Field(..., description="Agent persona description")
15 |     prompt: str = Field(..., description="Prompt template")
16 | 
17 |     @validator('name')
18 |     def validate_name(cls, v):
19 |         if not v or not isinstance(v, str) or len(v) < 3:
20 |             raise ValueError("Agent name must be a string of at least 3 characters")
21 |         return v
22 | 
23 |     @validator('model')
24 |     def validate_model(cls, v):
25 |         if not v or not isinstance(v, str) or ':' not in v:
26 |             raise ValueError("Model must be in format 'name:version'")
27 |         return v
28 | 
29 |     @validator('prompt')
30 |     def validate_prompt(cls, v):
31 |         if not v or not isinstance(v, str) or len(v) < 10:
32 |             raise ValueError("Prompt must be a string of at least 10 characters")
33 |         return v
34 | 
35 | class ConfigSchema(BaseModel):
36 |     agents: List[AgentConfig] = Field(..., description="List of agent configurations")
37 | 
38 |     @validator('agents')
39 |     def validate_agents(cls, v):
40 |         if len(v) < 1:
41 |             raise ValueError("At least one agent must be configured")
42 |         return v
43 | 
44 |     @classmethod
45 |     def validate_yaml(cls, yaml_content: str) -> 'ConfigSchema':
46 |         """Validate YAML configuration content"""
47 |         try:
48 |             config_dict = yaml.safe_load(yaml_content)
49 |             return cls(**config_dict)
50 |         except yaml.YAMLError as e:
51 |             raise ValueError(f"Invalid YAML format: {str(e)}")
52 |         except Exception as e:
53 |             raise ValueError(f"Configuration validation failed: {str(e)}")
54 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ollama
2 | rich
3 | arxiv
4 | requests
5 | xmltodict
6 | pydantic
7 | pyyaml


--------------------------------------------------------------------------------