├── requirements.txt ├── .gitignore ├── LICENSE ├── README.md └── popcorn_storyboard.py /requirements.txt: -------------------------------------------------------------------------------- 1 | pydantic 2 | aiohttp 3 | python-dotenv 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | secrets.py 2 | .env 3 | __pycache__/ 4 | *.pyc 5 | popcorn_output/ 6 | sequence_*/ 7 | .DS_Store 8 | .vscode/ 9 | .idea/ 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Open Higgsfield Popcorn Contributors 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Open Higgsfield Popcorn 🍿 2 | 3 | An open-source alternative to **Higgsfield Popcorn**, designed to generate consistent, cinematic storyboards and visual sequences using AI. 4 | 5 | ## What is Higgsfield Popcorn? 6 | 7 | [Higgsfield Popcorn](https://higgsfield.ai) is a powerful AI tool for creators that generates consistent character and environment sequences for storyboards, marketing campaigns, and visual storytelling. It solves a major pain point in AI image generation: **consistency**. It allows users to create 4-8 frames that look like they belong to the same movie or narrative, maintaining character identity and visual style across different shots and angles. 8 | 9 | ## About This Project 10 | 11 | **Open Higgsfield Popcorn** is an open-source implementation inspired by the original tool. It leverages the power of **MuAPI** (using models like `gpt-5-mini` and `nano-banana`) to achieve similar results: creating coherent, multi-frame visual stories from text prompts and reference images. 12 | 13 | ### Key Features 14 | 15 | * **Consistent Storytelling**: Generates 2-12 frames that maintain visual consistency in style, characters, and lighting. 16 | * **Auto Mode**: simply provide a prompt (e.g., "detective investigating a crime scene") and let the AI plan and generate the entire sequence. 17 | * **Manual Mode**: Have full control by specifying the description for each shot individually. 18 | * **Reference-Driven**: Use character and environment reference images to guide the generation and ensure identity consistency. 19 | * **Cinematic Planning**: Uses an LLM "Director" to intelligently plan shot types (wide, close-up, etc.) and camera angles based on your narrative context. 20 | * **Style Control**: Choose from various visual styles (Cinematic Realistic, Anime, Noir, etc.). 21 | 22 | ## Installation 23 | 24 | 1. Clone this repository. 25 | 2. Install dependencies: 26 | ```bash 27 | pip install requests python-dotenv pydantic aiohttp 28 | ``` 29 | 3. Configure your API keys: 30 | * Rename `secrets.py` or edit it directly. 31 | * Add your **MuAPI** key (`MUAPIAPP_API_KEY`). You can get one from [muapi.ai](https://muapi.ai). 32 | 33 | ## Usage 34 | 35 | ### Auto Mode 36 | Generate a sequence from a single prompt. The AI will plan the shots for you. 37 | 38 | ```bash 39 | python popcorn_storyboard.py --prompt "A cyberpunk hacker breaking into a secure server room" --frames 6 --style "cyberpunk neon" 40 | ``` 41 | 42 | ### Manual Mode 43 | Define exactly what happens in each frame. 44 | 45 | ```bash 46 | python popcorn_storyboard.py --manual_shots "wide shot of a spooky house" "close up of a hand opening the door" "interior view of a dusty hallway" --style "horror" 47 | ``` 48 | 49 | ### Using References 50 | Upload reference images (e.g., your main character or a specific location) to guide the AI. 51 | 52 | ```bash 53 | python popcorn_storyboard.py --prompt "A knight fighting a dragon" --references https://example.com/knight.png https://example.com/dragon.png 54 | ``` 55 | 56 | ## Options 57 | 58 | * `--prompt`: The main story or scene description (Required for Auto Mode). 59 | * `--manual_shots`: List of descriptions for each frame (Enables Manual Mode). 60 | * `--frames`: Number of frames to generate (Default: 6). 61 | * `--style`: Visual style of the sequence (Default: "cinematic realistic"). 62 | * `--references`: URLs or paths to reference images (Up to 4 recommended). 63 | * `--output`: Directory to save the results. 64 | 65 | ## License 66 | 67 | This project is open-source and available under the MIT License. 68 | -------------------------------------------------------------------------------- /popcorn_storyboard.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Popcorn Storyboard Generator 4 | Inspired by Higgsfield Popcorn - Generate 4-8 consistent frames from text + reference images 5 | 6 | Usage: 7 | python popcorn_storyboard.py --prompt "detective investigating a crime scene" --references https://example.com/char.png https://example.com/env.png --frames 6 8 | """ 9 | 10 | import asyncio 11 | import argparse 12 | import logging 13 | import json 14 | import sys 15 | from pathlib import Path 16 | from typing import List, Optional, Dict 17 | from datetime import datetime 18 | from pydantic import BaseModel 19 | import aiohttp 20 | 21 | import os 22 | import time 23 | from dotenv import load_dotenv 24 | load_dotenv() 25 | 26 | from secrets import MUAPIAPP_API_KEY 27 | 28 | class LLMClient: 29 | def __init__(self, api_key: str, model: str = "gpt-5-mini"): 30 | self.api_key = api_key 31 | self.model = model 32 | self.base_url = "https://api.muapi.ai/api/v1" 33 | 34 | async def call(self, system_prompt: str, user_prompt: str, response_format: type = None, temperature: float = 0.7) -> dict: 35 | url = f"{self.base_url}/{self.model}" 36 | headers = { 37 | "Content-Type": "application/json", 38 | "x-api-key": self.api_key 39 | } 40 | 41 | full_prompt = f"{system_prompt}\n\n{user_prompt}" if system_prompt else user_prompt 42 | 43 | payload = { 44 | "prompt": full_prompt, 45 | } 46 | 47 | begin = time.time() 48 | 49 | async with aiohttp.ClientSession() as session: 50 | async with session.post(url, headers=headers, json=payload) as response: 51 | if response.status != 200: 52 | text = await response.text() 53 | raise Exception(f"Error: {response.status}, {text}") 54 | 55 | result = await response.json() 56 | request_id = result["request_id"] 57 | logger.info(f"Task submitted. Request ID: {request_id}") 58 | 59 | result_url = f"{self.base_url}/predictions/{request_id}/result" 60 | headers = {"x-api-key": self.api_key} 61 | 62 | while True: 63 | async with session.get(result_url, headers=headers) as response: 64 | if response.status == 200: 65 | result = await response.json() 66 | status = result["status"] 67 | 68 | if status == "completed": 69 | end = time.time() 70 | logger.info(f"Task completed in {end - begin:.2f} seconds.") 71 | text = result["outputs"][0] 72 | 73 | if response_format == dict: 74 | try: 75 | clean_text = text.strip() 76 | if clean_text.startswith("```json"): 77 | clean_text = clean_text[7:] 78 | if clean_text.startswith("```"): 79 | clean_text = clean_text[3:] 80 | if clean_text.endswith("```"): 81 | clean_text = clean_text[:-3] 82 | return json.loads(clean_text) 83 | except json.JSONDecodeError: 84 | logger.warning("Failed to parse JSON from LLM response") 85 | return {"raw_output": text} 86 | return text 87 | 88 | elif status == "failed": 89 | raise Exception(f"Task failed: {result.get('error')}") 90 | else: 91 | text = await response.text() 92 | raise Exception(f"Error: {response.status}, {text}") 93 | 94 | await asyncio.sleep(0.5) 95 | 96 | class NanoBananaImageGenerator: 97 | def __init__(self, api_key: str): 98 | self.api_key = api_key 99 | self.base_url = "https://api.muapi.ai/api/v1" 100 | 101 | async def _submit_and_poll(self, model: str, payload: dict) -> str: 102 | url = f"{self.base_url}/{model}" 103 | headers = { 104 | "Content-Type": "application/json", 105 | "x-api-key": self.api_key 106 | } 107 | 108 | async with aiohttp.ClientSession() as session: 109 | # Submit 110 | async with session.post(url, headers=headers, json=payload) as response: 111 | if response.status != 200: 112 | text = await response.text() 113 | raise Exception(f"Error submitting to {model}: {response.status}, {text}") 114 | result = await response.json() 115 | request_id = result["request_id"] 116 | logger.info(f"Task submitted to {model}. Request ID: {request_id}") 117 | 118 | # Poll 119 | result_url = f"{self.base_url}/predictions/{request_id}/result" 120 | headers = {"x-api-key": self.api_key} 121 | 122 | while True: 123 | async with session.get(result_url, headers=headers) as response: 124 | if response.status != 200: 125 | text = await response.text() 126 | raise Exception(f"Polling error: {response.status}, {text}") 127 | 128 | result = await response.json() 129 | status = result["status"] 130 | 131 | if status == "completed": 132 | return result["outputs"][0] 133 | elif status == "failed": 134 | raise Exception(f"Task failed: {result.get('error')}") 135 | 136 | await asyncio.sleep(0.5) 137 | 138 | async def generate_image(self, prompt: str, aspect_ratio: str = "16:9") -> str: 139 | """Generate image from text (T2I)""" 140 | payload = { 141 | "prompt": prompt, 142 | "aspect_ratio": aspect_ratio 143 | } 144 | return await self._submit_and_poll("nano-banana", payload) 145 | 146 | async def generate_with_references(self, prompt: str, reference_images: List[str], aspect_ratio: str = "16:9") -> str: 147 | """Generate image with references (I2I/Edit)""" 148 | payload = { 149 | "prompt": prompt, 150 | "images_list": reference_images, 151 | "aspect_ratio": aspect_ratio 152 | } 153 | return await self._submit_and_poll("nano-banana-edit", payload) 154 | 155 | # Setup logging 156 | logging.basicConfig( 157 | level=logging.INFO, 158 | format='%(asctime)s - %(levelname)s - %(message)s' 159 | ) 160 | logger = logging.getLogger(__name__) 161 | 162 | 163 | # ============================================================================ 164 | # VISION ANALYSIS 165 | # ============================================================================ 166 | 167 | async def ensure_url(image_path: str) -> str: 168 | """Validate that the input is a URL.""" 169 | if not image_path.startswith(("http://", "https://")): 170 | raise ValueError(f"Invalid reference: '{image_path}'. References must be image URLs (starting with http:// or https://). Local files are not supported.") 171 | return image_path 172 | 173 | 174 | class VisionAdapter: 175 | """ 176 | Vision-capable adapter using GPT-5-Nano: 177 | - Accepts image URLs 178 | - Produces structured analysis + description 179 | """ 180 | 181 | BASE_URL = "https://api.muapi.ai/api/v1" 182 | MODEL = "gpt-5-nano" 183 | 184 | def __init__(self, api_key: str): 185 | self.api_key = api_key 186 | if not self.api_key: 187 | raise ValueError("API key missing") 188 | 189 | async def analyze( 190 | self, 191 | prompt: str, 192 | image_url: str 193 | ) -> dict: 194 | """ 195 | Given an image URL, produce structured analysis. 196 | Returns a dict: { summary, tags, objects, colors, suggestions } 197 | """ 198 | # Ensure we have a URL 199 | image_url = await ensure_url(image_url) 200 | 201 | submit_url = f"{self.BASE_URL}/{self.MODEL}" 202 | 203 | headers = { 204 | "Content-Type": "application/json", 205 | "x-api-key": self.api_key, 206 | } 207 | 208 | payload = { 209 | "prompt": f"Analyze these images and return structured JSON: {prompt}", 210 | "image_url": image_url, 211 | } 212 | 213 | async with aiohttp.ClientSession() as session: 214 | # Submit 215 | logger.info(f"[VisionAdapter] Submitting analysis request for {image_url}") 216 | async with session.post(submit_url, headers=headers, json=payload) as resp: 217 | if resp.status != 200: 218 | logger.error(f"[VisionAdapter] Submit failed: {resp.status}") 219 | raise RuntimeError(f"MuAPI vision submit error {resp.status}: {await resp.text()}") 220 | data = await resp.json() 221 | request_id = data["request_id"] 222 | logger.info(f"[VisionAdapter] Request submitted. ID: {request_id}") 223 | 224 | # Poll 225 | result_url = f"{self.BASE_URL}/predictions/{request_id}/result" 226 | 227 | while True: 228 | await asyncio.sleep(0.5) 229 | 230 | async with session.get(result_url, headers={"x-api-key": self.api_key}) as resp: 231 | if resp.status != 200: 232 | raise RuntimeError(f"MuAPI poll error {resp.status}: {await resp.text()}") 233 | 234 | result = await resp.json() 235 | status = result["status"] 236 | 237 | if status == "completed": 238 | logger.info(f"[VisionAdapter] Request {request_id} completed.") 239 | output_text = result["outputs"][0] 240 | 241 | # Try to parse JSON (if model outputs JSON) 242 | try: 243 | return json.loads(output_text) 244 | except: 245 | # If it's plain text, wrap it 246 | return {"summary": output_text} 247 | 248 | if status == "failed": 249 | logger.info(f"[VisionAdapter] Request {request_id} failed.") 250 | raise RuntimeError(f"MuAPI vision task failed: {result.get('error')}") 251 | 252 | logger.debug(f"[VisionAdapter] Polling {request_id}: {status}...") 253 | 254 | 255 | # ============================================================================ 256 | # DATA MODELS 257 | # ============================================================================ 258 | 259 | class ReferenceImage(BaseModel): 260 | """Analyzed reference image""" 261 | url: str 262 | type: str # "character", "environment", "prop", "lighting" 263 | description: str 264 | key_features: List[str] 265 | 266 | 267 | class FramePlan(BaseModel): 268 | """Plan for a single frame in the sequence""" 269 | frame_number: int 270 | shot_type: str # "wide", "medium", "close_up", "extreme_close_up" 271 | camera_angle: str # "eye_level", "low_angle", "high_angle", "dutch_angle" 272 | description: str 273 | focus_elements: List[str] 274 | composition_notes: str 275 | duration_hint: float = 1.5 # seconds per frame 276 | background_id: str # Added this field 277 | 278 | 279 | class BackgroundPlan(BaseModel): 280 | """Plan for a background/environment""" 281 | id: str # e.g., "bg_1", "kitchen", "studio" 282 | description: str 283 | frames: List[int] # List of frame numbers that use this background 284 | 285 | 286 | class PopcornSequence(BaseModel): 287 | """Complete sequence plan""" 288 | prompt: str 289 | num_frames: int 290 | references: List[ReferenceImage] 291 | frames: List[FramePlan] 292 | backgrounds: List[BackgroundPlan] # Added this field 293 | style: str 294 | consistency_rules: str 295 | 296 | 297 | # ============================================================================ 298 | # REFERENCE ANALYZER 299 | # ============================================================================ 300 | 301 | class ReferenceAnalyzer: 302 | """Analyze reference images using vision AI""" 303 | 304 | def __init__(self, vision: VisionAdapter, llm: LLMClient): 305 | self.vision = vision 306 | self.llm = llm 307 | 308 | async def analyze_reference(self, image_url: str, index: int) -> ReferenceImage: 309 | """Analyze a single reference image""" 310 | logger.info(f" Analyzing reference {index + 1}...") 311 | 312 | # Step 1: Vision analysis 313 | analysis_prompt = """Describe this image in detail: 314 | - What type of subject is this? (character, environment, object, lighting setup) 315 | - Key visual features (colors, textures, style, mood) 316 | - Notable details for consistency (clothing, architecture, props) 317 | 318 | Return JSON: {"type": "...", "description": "...", "key_features": [...]}""" 319 | 320 | vision_result = await self.vision.analyze(analysis_prompt, image_url) 321 | 322 | # Step 2: Structure the analysis 323 | ref_type = vision_result.get("type", "unknown") 324 | description = vision_result.get("description", vision_result.get("summary", "")) 325 | features = vision_result.get("key_features", []) 326 | 327 | # If features not provided, extract from description 328 | if not features and description: 329 | features = description.split(", ")[:5] 330 | 331 | return ReferenceImage( 332 | url=image_url, 333 | type=ref_type, 334 | description=description, 335 | key_features=features 336 | ) 337 | 338 | 339 | # ============================================================================ 340 | # SEQUENCE PLANNER 341 | # ============================================================================ 342 | 343 | class SequencePlanner: 344 | """Plan shot sequence - adapts to ANY use case (storytelling, product, architecture, etc.)""" 345 | 346 | def __init__(self, llm: LLMClient): 347 | self.llm = llm 348 | 349 | async def plan_sequence( 350 | self, 351 | prompt: str, 352 | num_frames: int, 353 | references: List[ReferenceImage], 354 | style: str = "cinematic realistic" 355 | ) -> PopcornSequence: 356 | """Plan a coherent sequence of frames - adapts to prompt context""" 357 | logger.info(f"Planning {num_frames}-frame sequence...") 358 | 359 | # Build context from references 360 | ref_context = self._build_reference_context(references) 361 | 362 | # Create planning prompt - LET THE LLM DECIDE THE APPROACH 363 | planning_prompt = f"""You are a professional visual planner. Analyze the user's request and plan {num_frames} frames accordingly. 364 | 365 | USER PROMPT: "{prompt}" 366 | 367 | REFERENCE IMAGES AVAILABLE: 368 | {ref_context} 369 | 370 | STYLE: {style} 371 | 372 | TASK: Plan {num_frames} frames that best fulfill the user's prompt. 373 | 374 | CONTEXT DETECTION: 375 | - If prompt is about storytelling/narrative → plan cinematic shots (wide, close-up, camera angles) 376 | - If prompt is about product photography → plan product shots (front view, side view, top view, detail shots) 377 | - If prompt is about architecture → plan architectural views (exterior, interior, details, context) 378 | - If prompt is about fashion → plan fashion shots (full body, detail shots, poses) 379 | - If prompt is about anything else → adapt intelligently 380 | 381 | SHOT TYPE GUIDANCE: 382 | - Storytelling: "wide_shot", "close_up", "medium_shot", "over_shoulder", etc. 383 | - Product: "front_view", "side_view", "top_view", "45_degree_angle", "detail_shot", "lifestyle_shot" 384 | - Architecture: "exterior_wide", "interior_view", "detail_element", "aerial_view" 385 | - Fashion: "full_body", "upper_body", "detail_closeup", "editorial_pose" 386 | - Generic: describe the view naturally 387 | 388 | CAMERA ANGLE GUIDANCE: 389 | - Storytelling: "eye_level", "low_angle", "high_angle", "dutch_angle" 390 | - Product: "straight_on", "slightly_elevated", "45_degree", "overhead" 391 | - Architecture: "street_level", "elevated", "birds_eye" 392 | - Fashion: "eye_level", "low_angle", "high_fashion_angle" 393 | - Generic: describe the angle naturally 394 | 395 | Each frame should: 396 | 1. Use the reference images for consistency 397 | 2. Vary views/angles for visual interest 398 | 3. Maintain consistent lighting and style 399 | 4. Create a natural progression 400 | 401 | BACKGROUND/ENVIRONMENT HANDLING: 402 | - Analyze the prompt to determine needed backgrounds 403 | - Product photography/static scenes → Usually 1 consistent background 404 | - Storytelling with movement → May need multiple backgrounds 405 | - Define the specific backgrounds needed for this sequence 406 | 407 | Examples: 408 | - Prompt: "product shots for e-commerce" → All frames: same white/studio background 409 | - Prompt: "hero escapes building and runs to car" → Frame 1: building interior, Frame 2-3: street, Frame 4: car 410 | - Prompt: "detective in crowded market" → Background: "Bustling market street with colorful stalls, awnings, cobblestone path, and atmospheric lighting" (Note: NO mention of detective) 411 | - Prompt: "orange cup on table" → Background: "Empty wooden table surface with blurred kitchen background" (Note: NO mention of cup) 412 | 413 | CRITICAL CONSISTENCY RULES: 414 | - Subject/product/character must look IDENTICAL in every frame 415 | - Lighting approach and color palette must match across all frames 416 | - Style must be uniform ({style}) 417 | - Background consistency: decide based on prompt context 418 | 419 | Return JSON: 420 | {{ 421 | "backgrounds": [ 422 | {{ 423 | "id": "bg_1", 424 | "description": "Detailed description of the SETTING ONLY. Do NOT mention the main character, product, or subject. Describe the environment as if the subject is not there.", 425 | "frames": [1, 2, 3, 4] 426 | }} 427 | ], 428 | "frames": [ 429 | {{ 430 | "frame_number": 1, 431 | "shot_type": "", 432 | "camera_angle": "", 433 | "description": "Detailed description of what we see in this frame", 434 | "focus_elements": ["element1", "element2"], 435 | "composition_notes": "Technical composition details", 436 | "background_id": "bg_1" 437 | }}, 438 | ... 439 | ], 440 | "consistency_rules": "Summary of what must stay consistent across all frames" 441 | }} 442 | 443 | IMPORTANT: 444 | 1. Choose shot_type and camera_angle terminology that matches the context of the prompt. 445 | 2. Define backgrounds separately in the 'backgrounds' list. 446 | 3. Link each frame to a background_id.""" 447 | 448 | response = await self.llm.call( 449 | system_prompt="You are an expert visual planner who adapts to any creative brief - from film to product photography to architecture to fashion and beyond.", 450 | user_prompt=planning_prompt, 451 | response_format=dict, 452 | temperature=0.4 453 | ) 454 | 455 | # Parse response 456 | frames = [] 457 | for f in response["frames"]: 458 | # Ensure background_id exists, default to first bg if missing 459 | if "background_id" not in f and response.get("backgrounds"): 460 | f["background_id"] = response["backgrounds"][0]["id"] 461 | frames.append(FramePlan(**f)) 462 | 463 | backgrounds = [BackgroundPlan(**b) for b in response.get("backgrounds", [])] 464 | consistency_rules = response.get("consistency_rules", "Maintain visual consistency") 465 | 466 | return PopcornSequence( 467 | prompt=prompt, 468 | num_frames=num_frames, 469 | references=references, 470 | frames=frames, 471 | backgrounds=backgrounds, 472 | style=style, 473 | consistency_rules=consistency_rules 474 | ) 475 | 476 | def _build_reference_context(self, references: List[ReferenceImage]) -> str: 477 | """Build text context from reference images""" 478 | lines = [] 479 | for i, ref in enumerate(references): 480 | lines.append(f"Reference {i+1} ({ref.type}):") 481 | lines.append(f" Description: {ref.description}") 482 | lines.append(f" Key features: {', '.join(ref.key_features)}") 483 | return "\n".join(lines) 484 | 485 | 486 | # ============================================================================ 487 | # BACKGROUND GENERATOR 488 | # ============================================================================ 489 | 490 | class BackgroundGenerator: 491 | """Generate background reference images""" 492 | 493 | def __init__(self, image_gen: NanoBananaImageGenerator, output_dir: Path, style: str): 494 | self.image_gen = image_gen 495 | self.output_dir = output_dir / "backgrounds" 496 | self.output_dir.mkdir(parents=True, exist_ok=True) 497 | self.style = style 498 | 499 | async def generate_backgrounds(self, sequence: PopcornSequence) -> Dict[str, Dict[str, str]]: 500 | """Generate all planned backgrounds. Returns dict of id -> {path: local_path, url: original_url}""" 501 | logger.info(f"Generating {len(sequence.backgrounds)} background(s)...") 502 | 503 | bg_data = {} 504 | 505 | for bg in sequence.backgrounds: 506 | logger.info(f" Generating background '{bg.id}': {bg.description[:50]}...") 507 | 508 | # Build prompt - Clean environment description 509 | # We rely on the planner to have removed the subject from the description 510 | prompt = ( 511 | f"{self.style} style. {bg.description}. " 512 | "background reference, location shot, environmental view, high quality, 4k" 513 | ) 514 | 515 | # Generate 516 | bg_url = await self.image_gen.generate_image( 517 | prompt=prompt, 518 | aspect_ratio="16:9" 519 | ) 520 | 521 | # Download 522 | bg_filename = f"{bg.id}.png" 523 | bg_path_local = self.output_dir / bg_filename 524 | 525 | try: 526 | async with aiohttp.ClientSession() as session: 527 | async with session.get(bg_url) as response: 528 | if response.status == 200: 529 | with open(bg_path_local, 'wb') as f: 530 | while True: 531 | chunk = await response.content.read(1024) 532 | if not chunk: 533 | break 534 | f.write(chunk) 535 | bg_data[bg.id] = {"path": str(bg_path_local), "url": bg_url} 536 | logger.info(f" ✓ Saved: {bg_path_local.name}") 537 | else: 538 | logger.error(f" ✗ Failed to download background {bg.id}") 539 | bg_data[bg.id] = {"path": bg_url, "url": bg_url} 540 | except Exception as e: 541 | logger.error(f" ✗ Download error for {bg.id}: {e}") 542 | bg_data[bg.id] = {"path": bg_url, "url": bg_url} 543 | 544 | return bg_data 545 | 546 | 547 | # ============================================================================ 548 | # FRAME GENERATOR 549 | # ============================================================================ 550 | 551 | class FrameGenerator: 552 | """Generate consistent frames using nano-banana-edit""" 553 | 554 | def __init__(self, image_gen: NanoBananaImageGenerator, output_dir: Path, style: str): 555 | self.image_gen = image_gen 556 | self.output_dir = output_dir / "frames" 557 | self.output_dir.mkdir(parents=True, exist_ok=True) 558 | self.style = style 559 | 560 | async def generate_frames( 561 | self, 562 | sequence: PopcornSequence, 563 | bg_data: Dict[str, Dict[str, str]] 564 | ) -> List[str]: 565 | """Generate all frames using reference images for consistency""" 566 | logger.info(f"Generating {sequence.num_frames} frames...") 567 | 568 | # Collect user-provided reference URLs 569 | user_ref_urls = [ref.url for ref in sequence.references] 570 | 571 | # Track first frame details for consistency 572 | first_frame_details = None 573 | 574 | # Generate frames 575 | frame_paths = [] 576 | for frame in sequence.frames: 577 | logger.info(f" Frame {frame.frame_number}/{sequence.num_frames}: {frame.shot_type}") 578 | 579 | # Build frame prompt with detail consistency 580 | prompt = self._build_frame_prompt(frame, sequence, first_frame_details) 581 | 582 | # Extract details from first frame for future consistency 583 | if frame.frame_number == 1: 584 | first_frame_details = self._extract_detail_hints(frame) 585 | 586 | # Prepare references for this frame 587 | current_frame_refs = user_ref_urls.copy() 588 | current_bg_info = bg_data.get(frame.background_id) 589 | if current_bg_info and current_bg_info.get("url"): 590 | current_frame_refs.append(current_bg_info["url"]) 591 | 592 | # Generate with references 593 | frame_url = await self.image_gen.generate_with_references( 594 | prompt=prompt, 595 | reference_images=current_frame_refs, 596 | aspect_ratio="16:9" 597 | ) 598 | 599 | # Download frame 600 | frame_filename = f"frame_{frame.frame_number:02d}.png" 601 | frame_path_local = self.output_dir / frame_filename 602 | 603 | try: 604 | async with aiohttp.ClientSession() as session: 605 | async with session.get(frame_url) as response: 606 | if response.status == 200: 607 | with open(frame_path_local, 'wb') as f: 608 | while True: 609 | chunk = await response.content.read(1024) 610 | if not chunk: 611 | break 612 | f.write(chunk) 613 | frame_paths.append(str(frame_path_local)) 614 | logger.info(f" ✓ Saved: {frame_path_local.name}") 615 | else: 616 | frame_paths.append(frame_url) 617 | except Exception as e: 618 | logger.error(f" ✗ Download error for frame {frame.frame_number}: {e}") 619 | frame_paths.append(frame_url) 620 | 621 | return frame_paths 622 | 623 | def _extract_detail_hints(self, frame: FramePlan) -> str: 624 | """Extract detail-level consistency hints from first frame description""" 625 | desc = frame.description.lower() 626 | hints = [] 627 | 628 | # SUBJECT/CHARACTER DETAILS (accessories, props that should stay consistent) 629 | # Check for accessories/props mentioned 630 | if "glove" in desc: 631 | hints.append("wearing gloves") 632 | elif "bare hand" in desc or "no glove" in desc: 633 | hints.append("bare hands (no gloves)") 634 | 635 | if "watch" in desc: 636 | hints.append("wearing watch") 637 | 638 | if "glass" in desc and "wearing" in desc: 639 | hints.append("wearing glasses") 640 | elif "no glass" in desc: 641 | hints.append("no glasses") 642 | 643 | if "holding" in desc: 644 | # Extract what's being held 645 | for item in ["cup", "mug", "phone", "bag", "tool", "book", "pen", "bottle", "box"]: 646 | if item in desc: 647 | hints.append(f"holding {item}") 648 | 649 | return ", ".join(hints) if hints else "" 650 | 651 | def _build_frame_prompt(self, frame: FramePlan, sequence: PopcornSequence, first_frame_details: Optional[str] = None) -> str: 652 | """Build detailed prompt for frame generation with detail consistency""" 653 | 654 | # Build base components 655 | prompt_parts = [ 656 | f"{sequence.style} style", 657 | f"{frame.shot_type.replace('_', ' ')} from {frame.camera_angle.replace('_', ' ')}", 658 | frame.description, 659 | ] 660 | 661 | # Add detail consistency for frames 2+ 662 | if first_frame_details and frame.frame_number > 1: 663 | prompt_parts.append(f"MAINTAIN EXACT DETAILS FROM FRAME 1: {first_frame_details}") 664 | 665 | prompt_parts.extend([ 666 | frame.composition_notes, 667 | sequence.consistency_rules, 668 | "subject naturally integrated in scene with proper lighting", 669 | "match scene lighting and atmosphere", 670 | "blend seamlessly with environment", 671 | "professional storyboard frame", 672 | "consistent with reference images", 673 | "cinematic framing", 674 | "high quality" 675 | ]) 676 | 677 | return ", ".join([p for p in prompt_parts if p]) 678 | 679 | 680 | 681 | 682 | 683 | # ============================================================================ 684 | # MAIN ORCHESTRATOR 685 | # ============================================================================ 686 | 687 | class PopcornGenerator: 688 | """Main orchestrator for Popcorn-style generation""" 689 | 690 | def __init__(self, muapi_key: str, output_dir: str, style: str = "cinematic realistic"): 691 | self.llm = LLMClient(muapi_key, model="gpt-5-mini") 692 | self.vision = VisionAdapter(muapi_key) 693 | self.image_gen = NanoBananaImageGenerator(muapi_key) 694 | self.output_dir = Path(output_dir) 695 | self.output_dir.mkdir(parents=True, exist_ok=True) 696 | self.style = style 697 | 698 | self.analyzer = ReferenceAnalyzer(self.vision, self.llm) 699 | self.planner = SequencePlanner(self.llm) 700 | self.bg_generator = BackgroundGenerator(self.image_gen, self.output_dir, style) 701 | self.generator = FrameGenerator(self.image_gen, self.output_dir, style) 702 | 703 | async def generate( 704 | self, 705 | prompt: str, 706 | reference_urls: List[str], 707 | num_frames: int = 6, 708 | manual_shots: Optional[List[str]] = None 709 | ) -> Dict: 710 | """Generate a Popcorn-style sequence""" 711 | logger.info(f"🍿 Popcorn Generator Starting...") 712 | if manual_shots: 713 | logger.info(f" Mode: MANUAL ({len(manual_shots)} shots)") 714 | else: 715 | logger.info(f" Mode: AUTO (Prompt: {prompt})") 716 | logger.info(f" Frames: {num_frames}") 717 | 718 | logger.info(f" References: {len(reference_urls)}") 719 | logger.info(f" Style: {self.style}") 720 | 721 | # Step 1: Analyze references 722 | logger.info("\n📸 Step 1: Analyzing reference images...") 723 | references = [] 724 | for i, url in enumerate(reference_urls): 725 | ref = await self.analyzer.analyze_reference(url, i) 726 | references.append(ref) 727 | logger.info(f" ✓ {ref.type}: {ref.description[:60]}...") 728 | 729 | # Step 2: Plan sequence 730 | logger.info("\n🎬 Step 2: Planning shot sequence...") 731 | if manual_shots: 732 | sequence = await self.planner.plan_manual_sequence(manual_shots, references, self.style) 733 | else: 734 | sequence = await self.planner.plan_sequence(prompt, num_frames, references, self.style) 735 | 736 | logger.info(f" ✓ Planned {len(sequence.frames)} frames") 737 | logger.info(f" Consistency: {sequence.consistency_rules}") 738 | 739 | # Step 3: Generate backgrounds 740 | bg_data = {} 741 | logger.info("\n🏙️ Step 3: Generating backgrounds...") 742 | bg_data = await self.bg_generator.generate_backgrounds(sequence) 743 | logger.info(f" ✓ Generated {len(bg_data)} backgrounds") 744 | 745 | # Step 4: Generate frames 746 | logger.info("\n🖼️ Step 4: Generating frames...") 747 | frame_paths = await self.generator.generate_frames(sequence, bg_data) 748 | 749 | logger.info(f" ✓ Generated {len(frame_paths)} frames") 750 | 751 | # Step 5: Save sequence data 752 | sequence_file = self.output_dir / "sequence.json" 753 | with open(sequence_file, 'w') as f: 754 | json.dump({ 755 | "prompt": prompt if not manual_shots else "Manual Mode", 756 | "manual_shots": manual_shots, 757 | "num_frames": len(sequence.frames), 758 | "references": [ref.model_dump() for ref in references], 759 | "frames": [frame.model_dump() for frame in sequence.frames], 760 | "backgrounds": [bg.model_dump() for bg in sequence.backgrounds], 761 | "frame_paths": frame_paths, 762 | "bg_paths": bg_data, 763 | "style": self.style, 764 | "consistency_rules": sequence.consistency_rules 765 | }, f, indent=2) 766 | 767 | logger.info(f"\n✅ Sequence complete!") 768 | logger.info(f" Frames: {self.output_dir}/frames/") 769 | logger.info(f" Data: {sequence_file}") 770 | 771 | return { 772 | "sequence": sequence, 773 | "frame_paths": frame_paths, 774 | "output_dir": str(self.output_dir) 775 | } 776 | 777 | 778 | # ============================================================================ 779 | # CLI 780 | # ============================================================================ 781 | 782 | async def main(): 783 | parser = argparse.ArgumentParser(description="Popcorn Storyboard Generator") 784 | 785 | parser.add_argument( 786 | "--prompt", 787 | help="Text prompt describing the scene/action (Required for Auto Mode)" 788 | ) 789 | 790 | parser.add_argument( 791 | "--manual_shots", 792 | nargs="+", 793 | help="List of manual shot descriptions (Enables Manual Mode)" 794 | ) 795 | 796 | parser.add_argument( 797 | "--references", 798 | nargs="+", 799 | default=[], 800 | help="Reference image URLs (up to 4 recommended)" 801 | ) 802 | 803 | parser.add_argument( 804 | "--frames", 805 | type=int, 806 | default=6, 807 | help="Number of frames to generate (Auto Mode only)" 808 | ) 809 | 810 | parser.add_argument( 811 | "--style", 812 | default="cinematic realistic", 813 | help="Visual style (e.g., 'anime', 'watercolor', 'noir film')" 814 | ) 815 | 816 | 817 | 818 | parser.add_argument( 819 | "--output", 820 | default="popcorn_output", 821 | help="Output directory" 822 | ) 823 | 824 | 825 | 826 | args = parser.parse_args() 827 | 828 | # Validate 829 | if not args.prompt and not args.manual_shots: 830 | print("Error: Must provide either --prompt OR --manual_shots") 831 | sys.exit(1) 832 | 833 | if args.manual_shots and len(args.manual_shots) < 1: 834 | print("Error: Manual mode requires at least 1 shot description") 835 | sys.exit(1) 836 | 837 | if args.frames < 2 or args.frames > 12: 838 | if not args.manual_shots: # Only enforce for auto mode 839 | print("Error: Frames must be between 2-12") 840 | sys.exit(1) 841 | 842 | if len(args.references) > 4: 843 | print("Warning: More than 4 references may reduce consistency") 844 | 845 | # Get API keys 846 | muapi_key = MUAPIAPP_API_KEY 847 | 848 | if not muapi_key: 849 | print("Error: MUAPIAPP_API_KEY missing in secrets.py") 850 | sys.exit(1) 851 | 852 | # Create output directory 853 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 854 | output_dir = Path(args.output) / f"sequence_{timestamp}" 855 | 856 | # Generate 857 | generator = PopcornGenerator(muapi_key, str(output_dir), args.style) 858 | 859 | try: 860 | result = await generator.generate( 861 | prompt=args.prompt if args.prompt else "Manual Mode", 862 | reference_urls=args.references, 863 | num_frames=args.frames if not args.manual_shots else len(args.manual_shots), 864 | manual_shots=args.manual_shots 865 | ) 866 | print(f"\n🎉 Success! Frames saved to: {result['output_dir']}") 867 | 868 | except Exception as e: 869 | logger.error(f"Generation failed: {e}") 870 | import traceback 871 | traceback.print_exc() 872 | sys.exit(1) 873 | 874 | 875 | if __name__ == "__main__": 876 | asyncio.run(main()) 877 | --------------------------------------------------------------------------------