├── README.md ├── main.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # CewlAI 2 | 3 | Screenshot 2025-01-24 at 9 51 33 PM 4 | 5 | CewlAI is a domain generation tool that uses Google's Gemini AI to create potential domain variations based on seed domains. It's inspired by dnscewl which was inspired by CeWL, but this tool focuses on domain name pattern recognition and generation. 6 | 7 | ## Features 8 | 9 | - Generate domain variations using AI pattern recognition 10 | - Support for single domain or list of domains as input 11 | - Control token usage and iteration count 12 | - Output results to file or console 13 | - Duplicate prevention 14 | - Domain count limiting 15 | - Verbose mode for debugging 16 | 17 | ## Prerequisites 18 | 19 | - Python 3.x 20 | - Google API key for Gemini AI 21 | 22 | ## Installation 23 | 24 | 1. Clone the repository: 25 | `git clone https://github.com/jthack/cewlai.git` 26 | `cd cewlai` 27 | 28 | 2. Install required packages: 29 | `pip install -r requirements.txt` 30 | 31 | 3. Set up your Google API key: 32 | `export GEMINI_API_KEY='your-api-key-here'` 33 | 34 | 4. Set up OpenAI API key 35 | `export OPENAI_API_KEY='your-api-key-here'` 36 | 37 | 5. (Optional) Set up White Rabbit Neo API key: 38 | - Get your API key from [Kindo AI](https://app.kindo.ai/) 39 | `export KINDO_API_KEY='your-api-key-here'` 40 | 41 | ## Input Methods 42 | 43 | The tool supports multiple ways to provide seed domains: 44 | 45 | 1. Single domain via command line: 46 | ``` 47 | python main.py -t example.com 48 | ``` 49 | 50 | 2. List of domains from a file: 51 | ``` 52 | python main.py -tL domains.txt 53 | ``` 54 | 55 | 3. Domains from stdin (pipe or redirect): 56 | ``` 57 | cat domains.txt | python main.py 58 | # or 59 | echo "example.com" | python main.py 60 | ``` 61 | 62 | Note: When using stdin, the token usage confirmation is automatically skipped. 63 | 64 | 65 | ## Token Management 66 | 67 | The tool automatically manages token usage to stay within API limits: 68 | 69 | - Input is automatically truncated if it exceeds 100,000 tokens 70 | - Use `-v` flag to see when truncation occurs 71 | - Token usage estimates are shown before processing begins 72 | - Use `--force` to skip the token usage confirmation prompt 73 | 74 | Example output with truncation: 75 | 76 | ``` 77 | $ cat large_domain_list.txt | python main.py -v 78 | 79 | [!] Input truncated to 15423 domains to stay under token limit 80 | 81 | Estimated token usage: 82 | * Per iteration: ~98750 tokens 83 | * Total for 1 loops: ~98750 tokens 84 | 85 | Continue? [y/N] 86 | ``` 87 | 88 | ## Usage 89 | 90 | Basic usage: 91 | python main.py -t example.com 92 | 93 | Using a list of domains: 94 | python main.py -tL domains.txt 95 | 96 | Common options: 97 | python main.py -tL domains.txt --loop 3 --limit 1000 -o output.txt 98 | 99 | ### Arguments 100 | 101 | ``` 102 | -t, --target: Specify a single seed domain 103 | -tL, --target-list: Input file containing seed domains (one per line) 104 | --loop: Number of AI generation iterations (default: 1) 105 | --limit: Maximum number of domains to generate (0 = unlimited) 106 | -o, --output: Write results to specified file 107 | -v, --verbose: Enable verbose output 108 | --no-repeats: Prevent duplicate domains across iterations 109 | --force: Skip token usage confirmation 110 | --openai: Use OpenAI API instead of Gemini 111 | -m, --model: Specify model to use (gemini, openai, whiterabbitneo, ollama) 112 | ``` 113 | 114 | ## Models 115 | 116 | ### Gemini (Default) 117 | The default model uses Google's Gemini AI. Requires `GEMINI_API_KEY` to be set. 118 | 119 | ### OpenAI 120 | Use OpenAI's models by specifying `-m openai`. Requires `OPENAI_API_KEY` to be set. 121 | 122 | ### White Rabbit Neo 123 | Use White Rabbit Neo by specifying `-m whiterabbitneo`. This model is powered by Kindo AI's WhiteRabbitNeo-33B-DeepSeekCoder model. Requires `KINDO_API_KEY` to be set. 124 | 125 | Example usage: 126 | ```bash 127 | export KINDO_API_KEY='your-api-key-here' 128 | python main.py -m whiterabbitneo -t example.com 129 | ``` 130 | 131 | ### Ollama 132 | Use local Ollama models by specifying `-m ollama`. Requires Ollama to be installed with at least one model. 133 | 134 | ## Examples 135 | 136 | Main use case (unix-way): 137 | `cat domains.txt | python main.py` 138 | 139 | Generate domains based on a single target: 140 | `python main.py -t example.com -o results.txt` 141 | 142 | Generate domains from a list with multiple iterations: 143 | `python main.py -tL company_domains.txt --loop 3 --limit 1000 -o generated_domains.txt` 144 | 145 | Verbose output with no repeats: 146 | `python main.py -t example.com -v --no-repeats` 147 | 148 | Using White Rabbit Neo with verbose output: 149 | `python main.py -m whiterabbitneo -t example.com -v` 150 | 151 | ## Output 152 | 153 | The tool will generate new domains based on patterns it recognizes in your seed domains. Output can be directed to: 154 | - Console (default) 155 | - File (using -o option) 156 | 157 | Only newly generated domains are shown in the output (seed domains are excluded). 158 | 159 | ## Advanced Usage 160 | 161 | ### Input File Format 162 | 163 | When using -tL, your input file should contain one domain per line: 164 | example.com 165 | subdomain.example.com 166 | another-example.com 167 | 168 | ### Output Format 169 | 170 | The output is a simple list of generated domains, one per line: 171 | api.example.com 172 | dev.example.com 173 | staging.example.com 174 | test.example.com 175 | 176 | ### Verbose Output 177 | 178 | Using -v provides detailed information about the generation process: 179 | ``` 180 | [+] LLM Generation Loop 1/3... 181 | [DEBUG] LLM suggested 50 new domain(s). 45 were added (others were duplicates?) 182 | [DEBUG] Original domains: 10 183 | [DEBUG] New domains generated: 45 184 | [DEBUG] Total domains processed: 55 185 | ``` 186 | 187 | ## How It Works 188 | 189 | 1. Seed Collection: The tool takes your input domains as seeds 190 | 2. AI Analysis: The selected model (Gemini, OpenAI, White Rabbit Neo, or Ollama) analyzes patterns in the seed domains 191 | 3. Generation: New domains are generated based on recognized patterns 192 | 4. Filtering: Results are filtered to remove duplicates and invalid formats 193 | 5. Output: Unique, new domains are presented in the specified format 194 | 195 | Remember that this tool is meant for legitimate security testing and research purposes only. -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import argparse 4 | import json 5 | import os 6 | import random 7 | import sys 8 | import logging 9 | import warnings 10 | from abc import ABC, abstractmethod 11 | 12 | import ollama 13 | import tiktoken 14 | import re 15 | 16 | 17 | """ 18 | (?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\\.)+ matches one or more subdomain segments 19 | 20 | Each segment must start and end with alphanumeric characters 21 | Can optionally contain hyphens in the middle 22 | Limited to 63 characters per segment (as per DNS rules) 23 | Must end with a dot 24 | 25 | [a-zA-Z]{2,} matches the top-level domain (TLD) with at least 2 characters 26 | """ 27 | DOMAIN_REGEX = r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}' 28 | 29 | """ 30 | Custom model config for whichever model the user selected to use for domain generation. If this dict is empty, the 31 | code assumes a set of defaults which should work. 32 | """ 33 | CUSTOM_MODEL_CONFIG: dict = {} 34 | 35 | 36 | class LLM(ABC): 37 | """This abstract class defines the interface for LLMs. To support an LLM, implement the methods here.""" 38 | 39 | @abstractmethod 40 | def __init__(self, default_model_config: dict): 41 | """Initializer with a default config dict, this is where you initialize the model""" 42 | 43 | if not default_model_config.get("model_name"): 44 | raise ValueError("default_model_config must contain model_name") 45 | 46 | self.default_model_config = CUSTOM_MODEL_CONFIG if len(CUSTOM_MODEL_CONFIG) > 0 else default_model_config 47 | self.model_name = self.default_model_config.pop("model_name") 48 | self.chat_history = [] 49 | 50 | @abstractmethod 51 | def chat(self, prompt: str): 52 | """Defines how to send chat messages to the configured LLM""" 53 | pass 54 | 55 | 56 | class Gemini(LLM): 57 | """Implementation for using Google Gemini as the subdomain generator""" 58 | 59 | def __init__(self, default_model_config: dict): 60 | import google.generativeai as genai 61 | super().__init__(default_model_config) 62 | warnings.filterwarnings('ignore') 63 | logging.getLogger().setLevel(logging.ERROR) 64 | try: 65 | genai.configure(api_key=os.environ["GEMINI_API_KEY"]) 66 | except KeyError as e: 67 | print("No Gemini API key set via a `GEMINI_API_KEY` env var. Please do so before using this model type.", 68 | file=sys.stderr) 69 | sys.exit(1) 70 | 71 | self.model = genai.GenerativeModel( 72 | model_name=self.model_name, 73 | generation_config=self.default_model_config, 74 | ) 75 | self.chat_session = self.model.start_chat(history=self.chat_history) 76 | 77 | def chat(self, prompt: str): 78 | return self.chat_session.send_message(prompt).text 79 | 80 | 81 | class OpenAI(LLM): 82 | """Implementation for using OpenAI as the subdomain generator""" 83 | 84 | def __init__(self, default_model_config: dict): 85 | import openai 86 | super().__init__(default_model_config) 87 | self.chat_history = [ 88 | {"role": "system", "content": "You are tasked with thinking of similar domains."} 89 | ] 90 | try: 91 | self.model = openai.OpenAI() 92 | except openai.OpenAIError as e: 93 | print(e, file=sys.stderr) 94 | sys.exit(1) 95 | 96 | def chat(self, prompt: str): 97 | from openai import NOT_GIVEN 98 | self.chat_history.append({"role": "user", "content": prompt}) 99 | response = self.model.chat.completions.create( 100 | model=self.model_name, 101 | messages=self.chat_history, 102 | temperature=self.default_model_config.get("temperature", NOT_GIVEN), 103 | top_p=self.default_model_config.get("top_p", NOT_GIVEN), 104 | max_tokens=self.default_model_config.get("max_tokens", NOT_GIVEN), 105 | stream=False, 106 | ) 107 | self.chat_history.append({"role": "assistant", "content": response.choices[0].message.content}) 108 | return response.choices[0].message.content 109 | 110 | 111 | class Ollama(LLM): 112 | """Implementation for using a local Ollama instance as the subdomain generator""" 113 | 114 | def __init__(self, default_model_config: dict): 115 | super().__init__(default_model_config) 116 | 117 | def chat(self, prompt: str): 118 | from ollama import chat 119 | from ollama import ChatResponse 120 | self.chat_history += [ 121 | {'role': 'user', 'content': prompt}, 122 | ] 123 | response: ChatResponse = chat( 124 | model=self.model_name, 125 | messages=self.chat_history, 126 | options={ 127 | 'temperature': self.default_model_config.get("temperature", None), 128 | 'top_p': self.default_model_config.get("top_p", None), 129 | 'top_k': self.default_model_config.get("top_k", None), 130 | }) 131 | self.chat_history += [ 132 | {'role': 'assistant', 'content': response.message.content}, 133 | ] 134 | return response.message.content 135 | 136 | 137 | def parse_model_config(model_config_path: str): 138 | """ 139 | Parses a model config file, if it exists, for use configuring the user provided LLM type. 140 | If no config values exist, opinionated defaults are used. If the user selected to use Ollama, 141 | an attempt is made to get the list of installed models and pick the first one. If that attempt fails 142 | an error will be thrown. 143 | """ 144 | global CUSTOM_MODEL_CONFIG 145 | if model_config_path == ".cewlai-model-config": 146 | from pathlib import Path 147 | home = Path.home() 148 | model_config_path = home / model_config_path 149 | 150 | try: 151 | with open(model_config_path) as model_config: 152 | data = model_config.read() 153 | try: 154 | CUSTOM_MODEL_CONFIG = json.loads(data) 155 | print(f"Using the custom model config values: \n{json.dumps(CUSTOM_MODEL_CONFIG, indent=2)}") 156 | except json.decoder.JSONDecodeError as e: 157 | print(f"Could not parse model config file at {model_config_path} because: {e} \n ") 158 | except OSError as e: 159 | error_message = os.strerror(e.errno) 160 | print(f"Could not open model config file at {model_config_path} because: {error_message} \n ") 161 | 162 | if len(CUSTOM_MODEL_CONFIG) == 0: 163 | print( 164 | f'No model config values specified. Using opinionated defaults for known model types which can be found in the code.') 165 | 166 | 167 | def parse_args(): 168 | """ 169 | Parses command-line arguments, providing a style similar to DNSCewl. 170 | We won't implement all the DNSCewl functionality here; just basic 171 | demonstration arguments plus the loop count & limit. 172 | """ 173 | parser = argparse.ArgumentParser( 174 | description="DNS-like domain generation script with LLM integration." 175 | ) 176 | 177 | # Mimic a few DNSCewl-style arguments 178 | parser.add_argument("-t", "--target", type=str, 179 | help="Specify a single seed domain.") 180 | parser.add_argument("-tL", "--target-list", type=str, 181 | help="Specify a file with seed domains (one per line).") 182 | parser.add_argument("--loop", type=int, default=1, 183 | help="Number of times to call the LLM in sequence.") 184 | parser.add_argument("--limit", type=int, default=0, 185 | help="Stop once we exceed this many total domains (0 means no limit).") 186 | parser.add_argument("-v", "--verbose", action="store_true", 187 | help="Enable verbose output.") 188 | parser.add_argument("--no-repeats", action="store_true", 189 | help="Ensure no repeated domain structures in final output.") 190 | parser.add_argument("-o", "--output", type=str, 191 | help="Output file to write results to.") 192 | parser.add_argument("--force", action="store_true", 193 | help="Skip token usage confirmation.") 194 | parser.add_argument("-m", "--model", type=str, 195 | help="Specify a model to use. Supports `gemini`, `openai`,`ollama`, `whiterabbitneo`", 196 | default="gemini") 197 | parser.add_argument("-c", "--model-config", type=str, 198 | help="Location of model config file. Uses ~/.cewlai-model-config by default.", 199 | default=".cewlai-model-config") 200 | 201 | return parser.parse_args() 202 | 203 | 204 | def get_seed_domains(args): 205 | """ 206 | Retrieves initial seed domains from arguments (either -t, -tL, or stdin). 207 | """ 208 | seed_domains = set() 209 | 210 | # Check if we have data on stdin 211 | if not sys.stdin.isatty(): 212 | for line in sys.stdin: 213 | line = line.strip() 214 | if line: 215 | seed_domains.add(line) 216 | 217 | # Single target if provided 218 | if args.target: 219 | seed_domains.add(args.target.strip()) 220 | 221 | # File-based targets 222 | if args.target_list: 223 | with open(args.target_list, "r") as f: 224 | for line in f: 225 | line = line.strip() 226 | if line: 227 | seed_domains.add(line) 228 | 229 | return list(seed_domains) 230 | 231 | 232 | def generate_new_domains(chat_session, domain_list, verbose=False): 233 | """ 234 | Given an LLM chat session and a domain list, prompt the LLM to produce new domains. 235 | We randomize domain_list first, then craft a system prompt to get predicted variations. 236 | """ 237 | try: 238 | # Randomize order of domains before sending to the model 239 | shuffled_domains = domain_list[:] 240 | random.shuffle(shuffled_domains) 241 | 242 | # Build the system / content prompt 243 | prompt_text = ( 244 | "Here is a list of domains:\n" 245 | f"{', '.join(shuffled_domains)}\n\n" 246 | "It's your job to output unique new domains that are likely to exist " 247 | "based on variations or predictive patterns you see in the existing list. " 248 | "In your output, none of the domains should repeat. " 249 | "Please output them one domain per line. Only output the domains, no other text. No paths. No protocols. Output should be a list of domains. If they only pass in a root domain, still output potential subdomains. For example, if they pass in 'google.com', output 'mail.google.com', 'drive.google.com', 'docs.google.com', etc. However if they pass in: sub.example.com\nsub2.example.com\nsub-a.example.com then output varuations like sub4.example.com\nsub5.example.com\nsub-b.example.com. \n" 250 | ) 251 | 252 | if verbose: 253 | print("[DEBUG] Prompt to LLM:") 254 | print(prompt_text) 255 | print() 256 | 257 | raw_output = chat_session.chat(prompt=prompt_text).strip() 258 | 259 | new_candidates = re.findall(DOMAIN_REGEX, raw_output) 260 | return new_candidates 261 | 262 | except Exception as e: 263 | print(f"\n[!] Error during domain generation: {str(e)}", file=sys.stderr) 264 | return [] 265 | 266 | 267 | def estimate_tokens(domain_list): 268 | """ 269 | Provides an accurate token count using tiktoken. 270 | Returns truncated domain list and token count. 271 | """ 272 | MAX_TOKENS = 100000 273 | enc = tiktoken.get_encoding("cl100k_base") # Using OpenAI's encoding as approximation 274 | 275 | # Build the prompt template without domains first 276 | prompt_template = ( 277 | "Here is a list of domains:\n" 278 | "{domains}\n\n" 279 | "It's your job to output unique new domains that are likely to exist " 280 | "based on variations or predictive patterns you see in the existing list. " 281 | "In your output, none of the domains should repeat. " 282 | "Please output them one domain per line." 283 | ) 284 | 285 | # Get base token count without domains 286 | base_tokens = len(enc.encode(prompt_template.format(domains=""))) 287 | 288 | # Calculate how many domains we can include 289 | truncated_domains = [] 290 | current_tokens = base_tokens 291 | 292 | for domain in domain_list: 293 | domain_tokens = len(enc.encode(domain + ", ")) 294 | if current_tokens + domain_tokens > MAX_TOKENS: 295 | break 296 | truncated_domains.append(domain) 297 | current_tokens += domain_tokens 298 | 299 | # Calculate final token count with actual domains 300 | final_prompt = prompt_template.format(domains=", ".join(truncated_domains)) 301 | total_tokens = len(enc.encode(final_prompt)) 302 | 303 | return truncated_domains, total_tokens 304 | 305 | 306 | def generate_domains_with_wrn(domain_list, verbose=False): 307 | """ 308 | A standalone function to generate domains using White Rabbit Neo. 309 | This is kept separate from the LLM abstraction for simplicity and reliability. 310 | """ 311 | import openai 312 | import os 313 | import random 314 | 315 | try: 316 | client = openai.OpenAI( 317 | base_url="https://llm.kindo.ai/v1", 318 | api_key=os.environ["KINDO_API_KEY"], 319 | default_headers={ 320 | "content-type": "application/json", 321 | "api-key": os.environ["KINDO_API_KEY"] 322 | } 323 | ) 324 | except (openai.OpenAIError, KeyError) as e: 325 | print("Error initializing White Rabbit Neo: Make sure KINDO_API_KEY environment variable is set.", file=sys.stderr) 326 | sys.exit(1) 327 | 328 | # Randomize order of domains before sending to the model 329 | shuffled_domains = domain_list[:] 330 | random.shuffle(shuffled_domains) 331 | 332 | # Build the prompt 333 | prompt_text = ( 334 | "Here is a list of domains:\n" 335 | f"{', '.join(shuffled_domains)}\n\n" 336 | "It's your job to output unique new domains that are likely to exist " 337 | "based on variations or predictive patterns you see in the existing list. " 338 | "In your output, none of the domains should repeat. " 339 | "Please output them one domain per line. Only output the domains, no other text. No paths. No protocols. Output should be a list of domains. If they only pass in a root domain, still output potential subdomains." 340 | ) 341 | 342 | if verbose: 343 | print("[DEBUG] Prompt to WRN:") 344 | print(prompt_text) 345 | print() 346 | 347 | try: 348 | response = client.chat.completions.create( 349 | model="/models/WhiteRabbitNeo-33B-DeepSeekCoder", 350 | messages=[ 351 | {"role": "system", "content": "You are tasked with thinking of similar domains."}, 352 | {"role": "user", "content": prompt_text} 353 | ], 354 | temperature=1, 355 | top_p=0.95, 356 | max_tokens=8192, 357 | stream=False, 358 | ) 359 | raw_output = response.choices[0].message.content.strip() 360 | new_candidates = re.findall(DOMAIN_REGEX, raw_output) 361 | return new_candidates 362 | 363 | except Exception as e: 364 | print(f"\n[!] Error during WRN domain generation: {str(e)}", file=sys.stderr) 365 | return [] 366 | 367 | 368 | def main(): 369 | args = parse_args() 370 | 371 | parse_model_config(args.model_config) 372 | 373 | # Get initial domain list and check if using stdin 374 | using_stdin = not sys.stdin.isatty() 375 | seed_domains = get_seed_domains(args) 376 | if not seed_domains: 377 | print("[!] No seed domains provided. Use -t, -tL, or pipe domains to stdin.", file=sys.stderr) 378 | sys.exit(1) 379 | 380 | # Get token-truncated domain list and count 381 | seed_domains, estimated_tokens = estimate_tokens(seed_domains) 382 | estimated_total = estimated_tokens * args.loop 383 | 384 | # Handle White Rabbit Neo separately from other models 385 | if str(args.model).lower() == "whiterabbitneo": 386 | print("Using White Rabbit Neo model") 387 | 388 | # Skip confirmation if using --force or stdin 389 | if not (args.force or using_stdin): 390 | print(f"\nEstimated token usage:") 391 | print(f"* Per iteration: ~{estimated_tokens} tokens") 392 | print(f"* Total for {args.loop} loops: ~{estimated_total} tokens") 393 | response = input("\nContinue? [y/N] ").lower() 394 | if response != 'y': 395 | print("Aborting.") 396 | sys.exit(0) 397 | elif args.verbose: 398 | print(f"\nEstimated token usage:") 399 | print(f"* Per iteration: ~{estimated_tokens} tokens") 400 | print(f"* Total for {args.loop} loops: ~{estimated_total} tokens") 401 | 402 | # We store all domains in a global set to avoid duplicates across loops 403 | all_domains = set(seed_domains) 404 | # Keep track of original domains to exclude from output 405 | original_domains = set(seed_domains) 406 | 407 | print("\nGenerating domains... This may take a minute or two depending on the number of iterations.") 408 | 409 | # Loop for the specified number of times 410 | for i in range(args.loop): 411 | if args.verbose: 412 | print(f"\n[+] WRN Generation Loop {i + 1}/{args.loop}...") 413 | 414 | new_domains = generate_domains_with_wrn(list(all_domains), verbose=args.verbose) 415 | 416 | if args.no_repeats: 417 | # Filter out anything we already have 418 | new_domains = [d for d in new_domains if d not in all_domains] 419 | 420 | # Add them to our global set 421 | before_count = len(all_domains) 422 | for dom in new_domains: 423 | all_domains.add(dom) 424 | after_count = len(all_domains) 425 | 426 | if args.verbose: 427 | print(f"[DEBUG] WRN suggested {len(new_domains)} new domain(s). " 428 | f"{after_count - before_count} were added (others were duplicates?).") 429 | 430 | # If we have a limit, check it now 431 | if 0 < args.limit <= len(all_domains): 432 | if args.verbose: 433 | print(f"[!] Reached limit of {args.limit} domains.") 434 | break 435 | 436 | # Get only the new domains (excluding original seed domains) 437 | new_domains = sorted(all_domains - original_domains) 438 | 439 | # Output handling 440 | if args.output: 441 | with open(args.output, 'w') as f: 442 | for dom in new_domains: 443 | f.write(f"{dom}\n") 444 | print(f"\nResults written to: {args.output}") 445 | else: 446 | print("\n=== New Generated Domains ===") 447 | for dom in new_domains: 448 | print(dom) 449 | 450 | if args.verbose: 451 | print(f"\n[DEBUG] Original domains: {len(original_domains)}") 452 | print(f"[DEBUG] New domains generated: {len(new_domains)}") 453 | print(f"[DEBUG] Total domains processed: {len(all_domains)}") 454 | 455 | sys.exit(0) 456 | 457 | # Handle other models using the LLM abstraction 458 | match (str(args.model).lower()): 459 | case "gemini": 460 | print("Using model " + args.model) 461 | llm_model = Gemini( 462 | { 463 | "model_name": "gemini-1.5-flash", 464 | "temperature": 1, 465 | "top_p": 0.95, 466 | "max_output_tokens": 8192, 467 | } 468 | ) 469 | case "openai": 470 | print("Using model " + args.model) 471 | llm_model = OpenAI( 472 | { 473 | "model_name": "gpt-4o", 474 | "temperature": 1, 475 | "top_p": 0.95, 476 | "top_k": 40, 477 | "max_tokens": 8192, 478 | "response_mime_type": "text/plain", 479 | } 480 | ) 481 | case "ollama": 482 | print("Using model " + args.model) 483 | ollama_models = ollama.list() 484 | if len(ollama_models.models) == 0: 485 | print("No Ollama models found. Please install one before using this model type.", file=sys.stderr) 486 | sys.exit(1) 487 | else: 488 | print(f"Defaulting to Ollama model: {ollama_models.models[0].model}") 489 | llm_model = Ollama( 490 | { 491 | "model_name": ollama_models.models[0].model, 492 | "temperature": 1, 493 | "top_p": 0.95, 494 | "top_k": 40, 495 | } 496 | ) 497 | case _: 498 | print("Model type must be either 'gemini','openai','whiterabbitneo', or 'ollama'", file=sys.stderr) 499 | sys.exit(1) 500 | 501 | # Skip confirmation if using --force or stdin 502 | if not (args.force or using_stdin): 503 | print(f"\nEstimated token usage:") 504 | print(f"* Per iteration: ~{estimated_tokens} tokens") 505 | print(f"* Total for {args.loop} loops: ~{estimated_total} tokens") 506 | response = input("\nContinue? [y/N] ").lower() 507 | if response != 'y': 508 | print("Aborting.") 509 | sys.exit(0) 510 | elif args.verbose: 511 | print(f"\nEstimated token usage:") 512 | print(f"* Per iteration: ~{estimated_tokens} tokens") 513 | print(f"* Total for {args.loop} loops: ~{estimated_total} tokens") 514 | 515 | # We store all domains in a global set to avoid duplicates across loops 516 | all_domains = set(seed_domains) 517 | # Keep track of original domains to exclude from output 518 | original_domains = set(seed_domains) 519 | 520 | print("\nGenerating domains... This may take a minute or two depending on the number of iterations.") 521 | 522 | # Loop for the specified number of times 523 | for i in range(args.loop): 524 | if args.verbose: 525 | print(f"\n[+] LLM Generation Loop {i + 1}/{args.loop}...") 526 | 527 | new_domains = generate_new_domains(llm_model, list(all_domains), verbose=args.verbose) 528 | 529 | if args.no_repeats: 530 | # Filter out anything we already have 531 | new_domains = [d for d in new_domains if d not in all_domains] 532 | 533 | # Add them to our global set 534 | before_count = len(all_domains) 535 | for dom in new_domains: 536 | all_domains.add(dom) 537 | after_count = len(all_domains) 538 | 539 | if args.verbose: 540 | print(f"[DEBUG] LLM suggested {len(new_domains)} new domain(s). " 541 | f"{after_count - before_count} were added (others were duplicates?).") 542 | 543 | # If we have a limit, check it now 544 | if 0 < args.limit <= len(all_domains): 545 | if args.verbose: 546 | print(f"[!] Reached limit of {args.limit} domains.") 547 | break 548 | 549 | # Get only the new domains (excluding original seed domains) 550 | new_domains = sorted(all_domains - original_domains) 551 | 552 | # Output handling 553 | if args.output: 554 | with open(args.output, 'w') as f: 555 | for dom in new_domains: 556 | f.write(f"{dom}\n") 557 | print(f"\nResults written to: {args.output}") 558 | else: 559 | print("\n=== New Generated Domains ===") 560 | for dom in new_domains: 561 | print(dom) 562 | 563 | if args.verbose: 564 | print(f"\n[DEBUG] Original domains: {len(original_domains)}") 565 | print(f"[DEBUG] New domains generated: {len(new_domains)}") 566 | print(f"[DEBUG] Total domains processed: {len(all_domains)}") 567 | 568 | if __name__ == "__main__": 569 | main() 570 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | google-generativeai>=0.3.0 2 | tiktoken>=0.6.0 3 | openai>=1.42.0 4 | ollama>=0.4.7 --------------------------------------------------------------------------------