{{title}} | Meridiano

├── static ├── webfonts │ ├── fa-solid-900.woff2 │ ├── fa-brands-400.woff2 │ ├── fa-regular-400.woff2 │ └── fa-v4compatibility.woff2 ├── icons │ └── icon.svg └── css │ └── style.css ├── .gitignore ├── feeds ├── default.py ├── tech.py └── brasil.py ├── templates ├── head.html ├── view_brief.html ├── header.html ├── index.html ├── add_article.html ├── view_article.html └── articles.html ├── .env.example ├── requirements.txt ├── CHANGELOG.md ├── config_base.py ├── models.py ├── utils.py ├── migrate.py ├── README.md ├── app.py ├── database.py ├── run_briefing.py └── LICENSE /static/webfonts/fa-solid-900.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lfzawacki/meridiano/HEAD/static/webfonts/fa-solid-900.woff2 -------------------------------------------------------------------------------- /static/webfonts/fa-brands-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lfzawacki/meridiano/HEAD/static/webfonts/fa-brands-400.woff2 -------------------------------------------------------------------------------- /static/webfonts/fa-regular-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lfzawacki/meridiano/HEAD/static/webfonts/fa-regular-400.woff2 -------------------------------------------------------------------------------- /static/webfonts/fa-v4compatibility.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lfzawacki/meridiano/HEAD/static/webfonts/fa-v4compatibility.woff2 -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Environment variables 2 | .env 3 | 4 | # Python virtual environment 5 | venv/ 6 | */venv/ 7 | 8 | # Python cache files 9 | __pycache__/ 10 | *.py[cod] 11 | *$py.class 12 | 13 | # Database file 14 | meridian.db 15 | *.sqlite 16 | *.sqlite3 17 | 18 | # OS generated files 19 | .DS_Store 20 | Thumbs.db 21 | .aider* 22 | -------------------------------------------------------------------------------- /feeds/default.py: -------------------------------------------------------------------------------- 1 | RSS_FEEDS = [ 2 | "http://rss.cnn.com/rss/cnn_world.rss", 3 | "https://feeds.bbci.co.uk/news/world/rss.xml", 4 | "https://www.aljazeera.com/xml/rss/all.xml", 5 | "https://feeds.reuters.com/Reuters/worldNews", 6 | "https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml" 7 | ] 8 | -------------------------------------------------------------------------------- /templates/head.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | {{title}} | Meridiano 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.env.example: -------------------------------------------------------------------------------- 1 | # Database Configuration 2 | DATABASE_URL=postgresql://username:password@localhost:5432/meridiano 3 | # For SQLite (development): DATABASE_URL=sqlite:///meridian.db 4 | 5 | # API Keys 6 | DEEPSEEK_API_KEY=your_deepseek_api_key_here 7 | EMBEDDING_API_KEY=your_embedding_api_key_here 8 | 9 | # Flask Configuration 10 | FLASK_SECRET_KEY=your_secret_key_here 11 | FLASK_ENV=development 12 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # meridiano/requirements.txt 2 | feedparser 3 | requests 4 | trafilatura 5 | scikit-learn 6 | numpy # Required by scikit-learn 7 | Flask 8 | python-dotenv 9 | markdown # For rendering Markdown in Flask 10 | schedule # Optional: if using Python-based scheduling 11 | openai # Add openai to requirements 12 | beautifulsoup4 13 | lxml 14 | lxml_html_clean 15 | # SQLModel and PostgreSQL dependencies 16 | sqlmodel 17 | psycopg2-binary # PostgreSQL adapter 18 | -------------------------------------------------------------------------------- /templates/view_brief.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% set title = "Briefing - " ~ generation_time %} 4 | {% include 'head.html' %} 5 | 6 |

7 | {% include 'header.html' %} 8 | 9 |

Intelligence Brief

10 |

Generated: {{ generation_time }} (ID: {{ brief_id }})

11 |

12 |

13 | {{ brief_content|safe }} 14 |

15 |

16 | 17 | 18 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## 0.0.1 [unreleased] 4 | 5 | ### Fixed 6 | 7 | - Fixed date filter type mismatch - updated database functions to accept `date` objects instead of strings 8 | - Fixed database session management - switched to context manager pattern to prevent connection leaks 9 | - Added validation for empty/invalid embedding responses to prevent clustering failures 10 | - Improved rating parsing with regex to handle multi-line LLM responses 11 | - Added logging for PostgreSQL sequence sync errors instead of silently ignoring them 12 | 13 | ### Changed 14 | 15 | - Added logging infrastructure to `database.py` 16 | - update session management in `app.py` 17 | - enhancements in `run_briefing.py` parsing logic and error handling 18 | 19 | 20 | -------------------------------------------------------------------------------- /templates/header.html: -------------------------------------------------------------------------------- 1 | {% set profile_params = {'feed_profile': current_feed_profile} if current_feed_profile is defined and current_feed_profile else {} %} 2 |

3 |

4 | 5 |

6 | Meridiano 7 | 8 | 12 |

13 | 14 | Add Article 15 | 16 |

17 |

18 |

19 | -------------------------------------------------------------------------------- /static/icons/icon.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% set title = "Meridiano Briefings" %} 4 | {% include 'head.html' %} 5 | 6 |

7 | {% include 'header.html' %} 8 | 9 |

Generated Briefings

10 | 11 |

12 | Filter by Profile: 13 | All 15 | {% for profile in available_profiles %} 16 | 18 | {{ profile }} 19 | 20 | {% endfor %} 21 |

22 | 23 | {% if briefs %} 24 |

27 | 28 | Briefing from {{ brief['generated_at'] | datetimeformat('%Y-%m-%d %H:%M') }} 29 | 30 | {{ brief['feed_profile'] }} 31 | (ID: {{ brief['id'] }}) 32 |

35 | {% else %} 36 |

No briefings have been generated yet. Run python run_briefing.py.

37 | {% endif %} 38 | 39 |

40 | 41 | 42 | -------------------------------------------------------------------------------- /templates/add_article.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% set title = "Add Article" %} 4 | {% include 'head.html' %} 5 | 6 |

7 | {% include 'header.html' %} 8 | 9 |

Add New Article Manually

10 | 11 | {# --- Flash Messages --- #} 12 | {% with messages = get_flashed_messages(with_categories=true) %} 13 | {% if messages %} 14 |

15 | {% for category, message in messages %} 16 |

17 | {% endfor %} 18 |

19 | {% endif %} 20 | {% endwith %} 21 | {# --- End Flash Messages --- #} 22 | 23 |

24 |

25 | Article URL: 26 | 28 | Enter the full URL of the article you want to add. 29 |

30 | 31 |

32 | Assign to Feed Profile (Optional): 33 | 41 | 42 | Select an existing profile, or leave as '{{ default_manual_profile | capitalize }}'. 43 | The article will be processed and potentially included in briefs for this profile. 44 | 45 |

46 | 47 | 48 |

49 | 50 |

51 | 52 | 53 | -------------------------------------------------------------------------------- /feeds/tech.py: -------------------------------------------------------------------------------- 1 | RSS_FEEDS = [ 2 | "https://techcrunch.com/feed/", 3 | "https://www.theverge.com/rss/index.xml", 4 | "https://arstechnica.com/feed/", 5 | "https://krebsonsecurity.com/feed/", 6 | "https://feeds.feedburner.com/TheHackersNews", 7 | "https://www.bleepingcomputer.com/feed/", 8 | "https://www.tomshardware.com/feeds/all", 9 | "https://www.scmp.com/rss/36/feed", # tech 10 | "https://www.scmp.com/rss/320663/feed", # china tech 11 | "https://www.scmp.com/rss/318220/feed", # startups 12 | "https://www.scmp.com/rss/318221/feed", # apps and gaming 13 | "https://www.scmp.com/rss/318224/feed", # science and research 14 | "https://www.scmp.com/rss/318222/feed", # innovation 15 | "https://www.wired.com/feed/category/backchannel/latest/rss", 16 | "https://www.wired.com/feed/rss", 17 | "https://economictimes.indiatimes.com/tech/rssfeeds/13357270.cms", 18 | "https://www.404media.co/rss", 19 | "https://theintercept.com/feed/", 20 | ] 21 | 22 | # Used in process_articles (operates globally, so uses default) 23 | PROMPT_ARTICLE_SUMMARY = "Summarize the key points of this news article objectively in 2-4 sentences. Identify the main topics covered.\n\nArticle:\n{article_content}" 24 | 25 | # Used in rate_articles (operates globally, so uses default) 26 | PROMPT_IMPACT_RATING = """Analyze the following article summary and estimate its overall impact. Consider factors like newsworthiness, originality, geographic scope (local vs global), number of people affected, severity, and potential long-term consequences. Be extremely critical and conservative when assigning scores—higher scores should reflect truly exceptional or rare events. 27 | 28 | Rate the impact on a scale of 1 to 10, using these guidelines: 29 | 30 | 1-2: Minimal significance. Niche interest or local news with no broader relevance. Example: A review of a local restaurant or a minor product launch. 31 | 32 | 3-4: Regionally notable. Pop culture happenings, local events, or community-focused stories. Example: A local mayor’s resignation or a regional festival. 33 | 34 | 5-6: Regionally significant or moderately global. Affects multiple communities or industries. Example: A nationwide strike or a major company bankruptcy. 35 | 36 | 7-8: Highly significant. Major international relevance, significant disruptions, or wide-reaching implications. Example: A large-scale natural disaster, global health alerts, or a major geopolitical shift. 37 | 38 | 9-10: Extraordinary and historic. Global, severe, and long-lasting implications. Example: Declaration of war, groundbreaking global treaties, or critical climate crises. 39 | 40 | Key Reminder: Scores of 9-10 should be exceedingly rare and reserved for world-defining events. Always err on the side of a lower score unless the impact is undeniably significant. 41 | 42 | Summary: 43 | "{summary}" 44 | 45 | Output ONLY the integer number representing your rating (1-10).""" 46 | 47 | # Used in generate_brief (can be overridden per profile) 48 | # Use default 49 | 50 | # Used in generate_brief (can be overridden per profile) 51 | PROMPT_BRIEF_SYNTHESIS = """ 52 | You are an AI assistant writing a daily intelligence briefing for a tech and politics youtuber using Markdown. The quality of this briefing is vital for the development of the channel. Synthesize the following analyzed news clusters into a coherent, high-level executive summary. Start with the 2-3 most critical overarching themes globally based *only* on these inputs. Then, provide concise bullet points summarizing key developments within the most significant clusters (roughly 7-10 clusters) and a paragraph summarizing connections and conclusions between the points. Maintain an objective, analytical tone. Avoid speculation.\n\n 53 | 54 | Analyzed News Clusters (Most significant first): 55 | {cluster_analyses_text} 56 | """ 57 | -------------------------------------------------------------------------------- /config_base.py: -------------------------------------------------------------------------------- 1 | # simple-meridian/config.py 2 | 3 | import os 4 | from dotenv import load_dotenv 5 | 6 | # Load environment variables 7 | load_dotenv() 8 | 9 | # Used in process_articles (operates globally, so uses default) 10 | PROMPT_ARTICLE_SUMMARY = """ 11 | Summarize the key points of this news article objectively in 2-4 sentences. 12 | Identify the main topics covered. 13 | 14 | Article: 15 | {article_content} 16 | """ 17 | 18 | # Used in rate_articles (operates globally, so uses default) 19 | PROMPT_IMPACT_RATING = """ 20 | Analyze the following news summary and estimate its overall impact. Consider factors like geographic scope (local vs global), number of people affected, severity, and potential long-term consequences. 21 | 22 | Rate the impact on a scale of 1 to 10, where: 23 | 1-2: Minor, niche, or local interest. 24 | 3-4: Notable event for a specific region or community. 25 | 5-6: Significant event with broader regional or moderate international implications. 26 | 7-8: Major event with significant international importance or wide-reaching effects. 27 | 9-10: Critical global event with severe, widespread, or potentially historic implications. 28 | 29 | Summary: 30 | "{summary}" 31 | 32 | Output ONLY the integer number representing your rating (1-10). 33 | """ 34 | 35 | # Used in generate_brief (can be overridden per profile) 36 | PROMPT_CLUSTER_ANALYSIS = """ 37 | These are summaries of potentially related news articles from a '{feed_profile}' context: 38 | 39 | {cluster_summaries_text} 40 | 41 | What is the core event or topic discussed? Summarize the key developments and significance in 3-5 sentences based *only* on the provided text. If the articles seem unrelated, state that clearly. 42 | """ 43 | 44 | # Used in generate_brief (can be overridden per profile) 45 | PROMPT_BRIEF_SYNTHESIS = """ 46 | You are an AI assistant writing a Presidential-style daily intelligence briefing using Markdown, specifically for the '{feed_profile}' category. 47 | Synthesize the following analyzed news clusters into a coherent, high-level executive summary. 48 | Start with the 2-3 most critical overarching themes globally or within this category based *only* on these inputs. 49 | Then, provide concise bullet points summarizing key developments within the most significant clusters (roughly 3-5 clusters). 50 | Maintain an objective, analytical tone relevant to the '{feed_profile}' context. Avoid speculation. 51 | 52 | Analyzed News Clusters (Most significant first): 53 | {cluster_analyses_text} 54 | """ 55 | 56 | # --- Processing Settings --- 57 | # How many hours back to look for articles when generating a brief 58 | BRIEFING_ARTICLE_LOOKBACK_HOURS = 24 59 | 60 | # --- Model Settings --- 61 | # Model for summarization and analysis (check Deepseek docs for latest models) 62 | DEEPSEEK_CHAT_MODEL = "deepseek-chat" 63 | # Model for embeddings 64 | EMBEDDING_MODEL = "intfloat/multilingual-e5-large-instruct" 65 | 66 | # Approximate number of clusters to aim for. Fine-tune based on results. 67 | # Alternatively, use algorithms like DBSCAN that don't require specifying k. 68 | N_CLUSTERS = 10 # Example, adjust as needed 69 | 70 | # Minimum number of articles required to attempt clustering/briefing 71 | MIN_ARTICLES_FOR_BRIEFING = 5 72 | 73 | ARTICLES_PER_PAGE = 15 74 | 75 | MANUALLY_ADDED_PROFILE_NAME = 'manual' 76 | DEFAULT_FEED_PROFILE = 'default' 77 | 78 | # --- Other --- 79 | DATABASE_FILE = "meridian.db" # Keep for backward compatibility 80 | 81 | # Database configuration 82 | DATABASE_URL = os.getenv("DATABASE_URL", f"sqlite:///{DATABASE_FILE}") 83 | 84 | # API Keys 85 | DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY") 86 | EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY") 87 | 88 | # Flask configuration 89 | FLASK_SECRET_KEY = os.getenv( 90 | "FLASK_SECRET_KEY", "dev-key-change-in-production" 91 | ) 92 | FLASK_ENV = os.getenv("FLASK_ENV", "development") 93 | -------------------------------------------------------------------------------- /feeds/brasil.py: -------------------------------------------------------------------------------- 1 | RSS_FEEDS = [ 2 | "https://feeds.folha.uol.com.br/emcimadahora/rss091.xml", # Folha de S.Paulo 3 | "https://agenciabrasil.ebc.com.br/rss/politica.xml", # Agência Brasil 4 | "https://reporterbrasil.org.br/feed-rss/", # Repórter Brasil 5 | "https://www.brasildefato.com.br/rss", # Brasil de Fato 6 | "https://www.intercept.com.br/feed/", # The Intercept 7 | "https://apublica.org/feed/", # Agencia Pública 8 | "https://elpais.com/tag/rss/brasil/", # El País Brasil 9 | "https://rss.app/feeds/HSToR6aoxLq2IUTt.xml", # Alma Preta 10 | "https://www.nexojornal.com.br/rss.xml", # Nexo Jornal 11 | "https://outraspalavras.net/feed/", # Outras Palavras 12 | "https://jacobin.com.br/feed/", # Jacobin Brasil 13 | "https://averdade.org.br/feed/", # Jornal A Verdade 14 | "https://revistaforum.com.br/rss/feed.html", # Revista Fórum 15 | "https://operamundi.uol.com.br/feed/", # Opera Mundi 16 | ] 17 | 18 | pt_br = " Responda em português brasileiro." 19 | 20 | # Used in process_articles (operates globally, so uses default) 21 | PROMPT_ARTICLE_SUMMARY = "Resuma os pontos principais desta notícia objetivamente em 2 a 4 frases. Identifique os principais tópicos abordados.\n\nArtigo:\n{article_content}." + pt_br 22 | 23 | # Used in rate_articles (operates globally, so uses default) 24 | PROMPT_IMPACT_RATING = """Analise o resumo da notícia a seguir e estime seu impacto no contexto brasileiro. Considere fatores como noticiabilidade, relevância para o público brasileiro, abrangência geográfica (local, regional ou nacional), número de pessoas afetadas, gravidade e potenciais consequências a longo prazo para o Brasil. Seja extremamente crítico e conservador ao atribuir pontuações — pontuações mais altas devem refletir eventos verdadeiramente excepcionais ou raros dentro da realidade brasileira. 25 | 26 | Avalie o impacto em uma escala de 1 a 10, usando estas diretrizes: 27 | 28 | 1-2: Significância mínima. Interesse de nicho ou notícias locais sem relevância mais ampla. Exemplo: Um evento cultural local ou a abertura de um pequeno comércio. 29 | 30 | 3-4: Notável regionalmente. Acontecimentos de relevância em um estado ou região específica. Exemplo: Mudanças na administração de uma cidade importante ou eventos regionais de grande participação. 31 | 32 | 5-6: Significativo nacionalmente. Afeta múltiplos estados ou tem relevância nacional moderada. Exemplo: Greves de categorias importantes ou mudanças significativas em políticas públicas regionais. 33 | 34 | 7-8: Altamente significativo no Brasil. Grande relevância nacional, interrupções significativas ou implicações de longo alcance. Exemplo: Um desastre natural em grande escala, crises políticas de grande impacto ou escândalos nacionais. 35 | 36 | 9-10: Extraordinário e histórico no contexto brasileiro. Implicações nacionais graves e duradouras. Exemplo: Mudanças constitucionais marcantes, crises econômicas severas ou eventos históricos que redefinem o país. 37 | 38 | Lembrete importante: Pontuações de 9 a 10 devem ser extremamente raras e reservadas para eventos que definem o Brasil. Sempre opte por uma pontuação menor, a menos que o impacto seja inegavelmente significativo. 39 | 40 | Resumo: 41 | "{summary}" 42 | 43 | Digite SOMENTE o número inteiro que representa sua classificação (1 a 10). 44 | """ 45 | 46 | # Used in generate_brief (can be overridden per profile) 47 | PROMPT_CLUSTER_ANALYSIS = """ 48 | Estes são resumos de artigos de notícias potencialmente relacionados de um contexto '{feed_profile}': 49 | 50 | {cluster_summaries_text} 51 | 52 | Qual é o evento ou tópico principal discutido? Resuma os principais desenvolvimentos e a importância em 3 a 5 frases, com base *apenas* no texto fornecido. Se os artigos parecerem não relacionados, informe isso claramente. 53 | """ + pt_br 54 | 55 | # Used in generate_brief (can be overridden per profile) 56 | PROMPT_BRIEF_SYNTHESIS = """ 57 | Você é um assistente de IA escrevendo um briefing diário de inteligência no estilo presidencial usando Markdown, especificamente para a categoria '{feed_profile}'. 58 | Sintetize os seguintes grupos de notícias analisados em um resumo executivo coerente e de alto nível que será apresentado em formato profssional. 59 | 60 | Comece com os 4 ou 5 temas abrangentes mais críticos em relação ao Brasil ou dentro desta categoria, com base *apenas* nestas informações. 61 | 62 | Em seguida, forneça tópicos concisos resumindo os principais desenvolvimentos dentro dos grupos mais significativos (aproximadamente 5 a 7 grupos). 63 | Mantenha um tom objetivo e analítico relevante para o contexto '{feed_profile}'. Evite especulações. 64 | 65 | Grupos de Notícias Analisados (Mais significativos primeiro): 66 | {cluster_analyses_text} 67 | """ 68 | -------------------------------------------------------------------------------- /templates/view_article.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | {% set title = "Article: %s" % article.get('title', 'Untitled') %} 4 | {% include 'head.html' %} 5 | 6 |

7 | {% include 'header.html' %} 8 | 9 |

10 | {% if article.image_url %} 11 | 12 | Image for {{ title }}

16 | 17 | 20 | {% else %} 21 |

22 | No Image Available 23 |

24 | {% endif %} 25 |

26 | 27 |

Article Details

28 | 29 |

Title

Impact Score

35 | {% if article.impact_score is not none %} 36 | 37 | {{ article.impact_score }} 38 | / 10 39 | {% else %} 40 | - (Not Rated) 41 | {% endif %} 42 |

AI Summary (Processed Content)

46 | {% if article.processed_content_html %} 47 | {{ article.processed_content_html | safe }} 48 | {% else %} 49 | Not Processed / Summarized Yet 50 | {% endif %} 51 |

Original URL

Source Feed

Published Date

{{ article['published_date'] | datetimeformat | default('N/A') }}

Fetched Date

Processed Date

Embedding Status

Cluster ID

{{ article['cluster_id'] if article['cluster_id'] is not none else 'Not Clustered' }}

75 | Raw Extracted Content 76 | 77 | 78 | 79 | 80 |

82 | {% if article.raw_content %} 83 |

{{ article.raw_content }}

84 | {% else %} 85 | Raw content not available or failed to extract. 86 | {% endif %} 87 |

89 | 90 |

91 | 108 | 109 | 110 | 111 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | """ 2 | SQLModel database models for Meridiano application. 3 | """ 4 | 5 | from datetime import datetime 6 | from typing import Optional 7 | 8 | from sqlmodel import Field, Session, SQLModel, create_engine, text 9 | 10 | import config_base as config 11 | 12 | 13 | class Article(SQLModel, table=True): 14 | """Article model representing news articles in the database.""" 15 | 16 | # Old SQLite schema for reference 17 | """ 18 | CREATE TABLE IF NOT EXISTS articles ( 19 | id INTEGER PRIMARY KEY AUTOINCREMENT, /* id is alias for rowid */ 20 | url TEXT UNIQUE NOT NULL, 21 | title TEXT, 22 | published_date DATETIME, 23 | feed_source TEXT, 24 | fetched_at DATETIME DEFAULT CURRENT_TIMESTAMP, 25 | raw_content TEXT, 26 | processed_content TEXT, 27 | embedding TEXT, 28 | processed_at DATETIME, 29 | cluster_id INTEGER, 30 | impact_score INTEGER, 31 | image_url TEXT, 32 | feed_profile TEXT NOT NULL DEFAULT 'default' 33 | ); 34 | CREATE INDEX IF NOT EXISTS idx_articles_url ON articles (url); 35 | CREATE INDEX IF NOT EXISTS idx_articles_processed_at ON articles (processed_at); 36 | CREATE INDEX IF NOT EXISTS idx_articles_published_date ON articles (published_date); 37 | """ 38 | 39 | __tablename__ = "articles" 40 | 41 | id: Optional[int] = Field(default=None, primary_key=True) 42 | url: str = Field(unique=True, index=True) 43 | title: Optional[str] = None 44 | published_date: Optional[datetime] = None 45 | feed_source: Optional[str] = None 46 | fetched_at: datetime = Field(default_factory=datetime.now) 47 | raw_content: Optional[str] = None 48 | processed_content: Optional[str] = None 49 | embedding: Optional[str] = None # JSON string 50 | processed_at: Optional[datetime] = Field(default=None, index=True) 51 | cluster_id: Optional[int] = None 52 | impact_score: Optional[int] = None 53 | image_url: Optional[str] = None 54 | feed_profile: str = Field(default="default", index=True) 55 | 56 | 57 | class Brief(SQLModel, table=True): 58 | """Brief model representing generated news briefs.""" 59 | 60 | # Old SQLite schema for reference 61 | """ 62 | # Briefs Table 63 | CREATE TABLE IF NOT EXISTS briefs ( 64 | id INTEGER PRIMARY KEY AUTOINCREMENT, 65 | generated_at DATETIME DEFAULT CURRENT_TIMESTAMP, 66 | brief_markdown TEXT NOT NULL, 67 | contributing_article_ids TEXT, 68 | feed_profile TEXT NOT NULL DEFAULT 'default' 69 | ) 70 | """ 71 | 72 | __tablename__ = "briefs" 73 | 74 | id: Optional[int] = Field(default=None, primary_key=True) 75 | generated_at: datetime = Field(default_factory=datetime.now) 76 | brief_markdown: str 77 | contributing_article_ids: Optional[str] = None # JSON string 78 | feed_profile: str = Field(default="default", index=True) 79 | 80 | 81 | # Database engine and session management 82 | engine = create_engine(config.DATABASE_URL, echo=False) 83 | 84 | 85 | def create_db_and_tables(): 86 | """Create database tables if they don't exist.""" 87 | SQLModel.metadata.create_all(engine) 88 | 89 | # Old SQLite schema for reference (replaced by to_tsvector in PostgreSQL) 90 | """ 91 | # --- FTS5 Virtual Table --- 92 | # Create the virtual table to index title and raw_content from articles 93 | # content='' means it doesn't store the content itself, only index 94 | # content_rowid='id' links the FTS rowid to the articles table id column 95 | CREATE VIRTUAL TABLE IF NOT EXISTS articles_fts USING fts5( 96 | title, 97 | raw_content, 98 | content='articles', 99 | content_rowid='id' 100 | ) 101 | 102 | # --- Triggers to keep FTS table synchronized --- 103 | # After inserting into articles, insert into articles_fts 104 | CREATE TRIGGER IF NOT EXISTS articles_ai AFTER INSERT ON articles BEGIN 105 | INSERT INTO articles_fts (rowid, title, raw_content) 106 | VALUES (new.id, new.title, new.raw_content); 107 | END; 108 | 109 | # Before deleting from articles, delete from articles_fts 110 | # Need old.id to identify the row in articles_fts 111 | CREATE TRIGGER IF NOT EXISTS articles_ad BEFORE DELETE ON articles BEGIN 112 | DELETE FROM articles_fts WHERE rowid=old.id; 113 | END; 114 | 115 | # After updating articles, update articles_fts 116 | CREATE TRIGGER IF NOT EXISTS articles_au AFTER UPDATE ON articles BEGIN 117 | UPDATE articles_fts SET title=new.title, raw_content=new.raw_content 118 | WHERE rowid=old.id; 119 | END; 120 | # --- End FTS Setup --- 121 | """ 122 | 123 | # For PostgreSQL, create full-text search index 124 | if "postgresql" in config.DATABASE_URL.lower(): 125 | with Session(engine) as session: 126 | try: 127 | # Create full-text search index 128 | session.exec( 129 | text(""" 130 | CREATE INDEX IF NOT EXISTS idx_articles_fts 131 | ON articles USING GIN( 132 | to_tsvector('english', 133 | coalesce(title, '') || ' ' || coalesce(raw_content, '') 134 | ) 135 | ) 136 | """) 137 | ) 138 | session.commit() 139 | print("PostgreSQL full-text search index created") 140 | except Exception as e: 141 | print(f"Note: FTS index creation: {e}") 142 | session.rollback() 143 | 144 | 145 | def get_session(): 146 | """Get a database session.""" 147 | return Session(engine) 148 | 149 | 150 | def init_db(): 151 | """Initialize the database - create all tables.""" 152 | create_db_and_tables() 153 | print("Database initialized with SQLModel.") 154 | 155 | 156 | if __name__ == "__main__": 157 | init_db() 158 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import trafilatura 2 | import requests 3 | from datetime import datetime 4 | from bs4 import BeautifulSoup 5 | import logging 6 | from urllib.parse import urljoin 7 | 8 | logging.basicConfig( 9 | level=logging.INFO, 10 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', 11 | datefmt='%Y-%m-%d %H:%M:%S' 12 | ) 13 | 14 | logger = logging.getLogger() 15 | 16 | # Helper function for date formatting (optional but nice) 17 | def format_datetime(value, format='%Y-%m-%d %H:%M'): 18 | if value is None: 19 | return "N/A" 20 | if isinstance(value, str): 21 | try: 22 | value = datetime.fromisoformat(value) 23 | except ValueError: 24 | return value # Return original string if parsing fails 25 | if isinstance(value, datetime): 26 | return value.strftime(format) 27 | return value 28 | 29 | def fetch_article_content_and_og_image(url): 30 | """ 31 | Fetches HTML, extracts main content using Trafilatura, 32 | and extracts the og:image URL using BeautifulSoup. 33 | 34 | Returns: 35 | dict: {'content': str|None, 'og_image': str|None} 36 | """ 37 | content = None 38 | og_image = None 39 | try: 40 | headers = { 41 | 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:137.0) Gecko/20100101 Firefox/137.0', 42 | "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", 43 | "Accept-Language": "en-US,en;q=0.5", 44 | "Accept-Encoding": "gzip, deflate", 45 | "Connection": "keep-alive", 46 | "Upgrade-Insecure-Requests": "1", 47 | "Sec-Fetch-Dest": "document", 48 | "Sec-Fetch-Mode": "navigate", 49 | "Sec-Fetch-Site": "none", 50 | "Sec-Fetch-User": "?1", 51 | "Cache-Control": "max-age=0", 52 | "referer": "https://www.google.com" 53 | } 54 | response = requests.get(url, headers=headers, timeout=20) # Increased timeout slightly 55 | response.raise_for_status() 56 | html_content = response.text 57 | 58 | # 1. Extract text content 59 | content = trafilatura.extract(html_content, include_comments=False, include_tables=False) 60 | 61 | # 2. Extract og:image using BeautifulSoup 62 | soup = BeautifulSoup(html_content, 'lxml') # Use lxml or html.parser 63 | og_image_tag = soup.find('meta', property='og:image') 64 | if og_image_tag and og_image_tag.get('content'): 65 | og_image = og_image_tag['content'] 66 | # Optionally resolve relative URLs - less common for og:image but possible 67 | og_image = urljoin(url, og_image) 68 | 69 | return {'content': content, 'og_image': og_image} 70 | 71 | except requests.exceptions.RequestException as e: 72 | print(f"Error fetching {url}: {e}") 73 | return {'content': None, 'og_image': None} 74 | except Exception as e: 75 | # Catch potential BeautifulSoup errors or others 76 | print(f"Error processing content/og:image from {url}: {e}") 77 | # Still return content if it was extracted before the error 78 | return {'content': content, 'og_image': None} 79 | 80 | def scrape_single_article_details(article_url): 81 | """ 82 | Fetches and extracts details (title, raw_content, image_url) for a single article URL. 83 | 84 | Args: 85 | article_url (str): The URL of the article to scrape. 86 | 87 | Returns: 88 | dict: {'title': str|None, 'raw_content': str|None, 'image_url': str|None, 'error': str|None} 89 | 'error' key will be present if fetching/processing failed. 90 | """ 91 | print(f"Attempting to scrape single article: {article_url}") 92 | fetched_title = None 93 | raw_content = None 94 | final_image_url = None 95 | error_message = None 96 | 97 | try: 98 | # Use the existing fetch_article_content_and_og_image helper 99 | fetch_result = fetch_article_content_and_og_image(article_url) # This already logs its own errors 100 | 101 | raw_content = fetch_result['content'] 102 | og_image_url = fetch_result['og_image'] 103 | 104 | if not raw_content: 105 | # fetch_article_content_and_og_image might have already logged, but good to have a specific error here 106 | error_message = "Failed to extract main content from the article." 107 | logger.warning(f"{error_message} URL: {article_url}") 108 | # Even if content fails, we might still get a title or image from OG tags if fetch worked partially 109 | # So, continue to try and extract title if possible. 110 | 111 | # --- Attempt to get a title from the page if not from RSS --- 112 | # We need to re-fetch or use the already fetched HTML if available from fetch_article_content_and_og_image 113 | # Let's assume fetch_article_content_and_og_image doesn't return the full soup object. 114 | # If it did, we could reuse it. For now, we might need a minimal re-fetch or a helper modification. 115 | # Simplification: If trafilatura got content, it often implies page was fetched. 116 | # We could parse the title from the raw_content's source HTML (if fetch_... stored it) 117 | # or do another small HEAD request or parse from a snippet. 118 | # For now, we'll use a placeholder if the feed didn't provide it. 119 | # A more robust solution would be to enhance fetch_article_content_and_og_image 120 | # to also return the tag content. 121 | 122 | if raw_content: # If we got content, try to get title from HTML 123 | try: 124 | # Need to parse the HTML again if not already available from previous fetch 125 | headers = {'User-Agent': 'Mozilla/5.0 ...'} # Your headers 126 | response = requests.get(article_url, headers=headers, timeout=15) 127 | response.raise_for_status() 128 | soup = BeautifulSoup(response.text, 'lxml') 129 | title_tag = soup.find('title') 130 | if title_tag and title_tag.string: 131 | fetched_title = title_tag.string.strip() 132 | # Fallback to OG title if HTML title is poor/missing 133 | if not fetched_title: 134 | og_title_tag = soup.find('meta', property='og:title') 135 | if og_title_tag and og_title_tag.get('content'): 136 | fetched_title = og_title_tag['content'].strip() 137 | except Exception as title_e: 138 | logger.warning(f"Could not extract title for {article_url}: {title_e}") 139 | # Continue without title if it fails 140 | 141 | final_image_url = og_image_url # For a single manual add, OG image is the primary target 142 | 143 | if not raw_content and not fetched_title and not final_image_url: 144 | # If absolutely nothing was fetched, it's a more significant error 145 | error_message = "Failed to fetch any content, title, or image from the URL." 146 | 147 | 148 | except Exception as e: 149 | error_message = f"General error scraping single article {article_url}: {e}" 150 | logger.error(error_message, exc_info=True) 151 | 152 | return { 153 | 'title': fetched_title, 154 | 'raw_content': raw_content, 155 | 'image_url': final_image_url, 156 | 'error': error_message 157 | } 158 | -------------------------------------------------------------------------------- /migrate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Migration utility for Meridiano: SQLite to PostgreSQL with SQLModel 4 | """ 5 | 6 | import sqlite3 7 | from datetime import datetime 8 | from pathlib import Path 9 | 10 | from sqlmodel import select 11 | 12 | import config_base as config 13 | from models import Article, Brief, create_db_and_tables, get_session 14 | 15 | def migrate_from_sqlite(): 16 | """ 17 | Migrate data from SQLite database to the configured database (PostgreSQL/SQLite) 18 | """ 19 | # Check if SQLite database exists 20 | sqlite_db_path = config.DATABASE_FILE 21 | if not Path(sqlite_db_path).exists(): 22 | print(f"[ERROR] SQLite database not found at {sqlite_db_path}") 23 | return False 24 | 25 | print(f"[INFO] Migrating from SQLite database: {sqlite_db_path}") 26 | print(f"[INFO] Target database: {config.DATABASE_URL}") 27 | 28 | # Create new database tables 29 | create_db_and_tables() 30 | 31 | # SQLite connection 32 | sqlite_conn = sqlite3.connect(sqlite_db_path) 33 | sqlite_conn.row_factory = sqlite3.Row 34 | 35 | try: 36 | # Migrate Articles 37 | print("\n[INFO] Migrating articles...") 38 | cursor = sqlite_conn.execute("SELECT * FROM articles") 39 | articles_migrated = 0 40 | articles_skipped = 0 41 | 42 | with get_session() as session: 43 | for row in cursor: 44 | row = dict(row) 45 | # Check if article already exists 46 | existing = session.get(Article, row["id"]) 47 | if existing: 48 | print( 49 | f"[SKIP] Article already exists (ID {row['id']}): {row['title']}" 50 | ) 51 | articles_skipped += 1 52 | continue 53 | 54 | try: 55 | article = Article( 56 | id=row["id"], # Preserve original ID 57 | url=row["url"], 58 | title=row["title"], 59 | published_date=datetime.fromisoformat( 60 | row["published_date"] 61 | ) 62 | if row["published_date"] 63 | else None, 64 | feed_source=row["feed_source"], 65 | fetched_at=datetime.fromisoformat(row["fetched_at"]) 66 | if row["fetched_at"] 67 | else datetime.now(), 68 | raw_content=row["raw_content"], 69 | processed_content=row["processed_content"], 70 | embedding=row["embedding"], 71 | processed_at=datetime.fromisoformat( 72 | row["processed_at"] 73 | ) 74 | if row["processed_at"] 75 | else None, 76 | cluster_id=row["cluster_id"], 77 | impact_score=row["impact_score"], 78 | image_url=row.get("image_url"), 79 | feed_profile=row.get("feed_profile", "default"), 80 | ) 81 | session.add(article) 82 | articles_migrated += 1 83 | 84 | if articles_migrated % 100 == 0: 85 | print( 86 | f" [INFO] Migrated {articles_migrated} articles..." 87 | ) 88 | 89 | except Exception as e: 90 | print(f"[ERROR] Error migrating article {row['id']}: {e}") 91 | 92 | session.commit() 93 | 94 | print( 95 | f"[DONE] Articles migration completed: {articles_migrated} migrated, {articles_skipped} skipped" 96 | ) 97 | 98 | # Migrate Briefs 99 | print("\n[INFO] Migrating briefs...") 100 | cursor = sqlite_conn.execute("SELECT * FROM briefs") 101 | briefs_migrated = 0 102 | briefs_skipped = 0 103 | 104 | with get_session() as session: 105 | for row in cursor: 106 | row = dict(row) 107 | # Check if brief already exists 108 | existing = session.get(Brief, row["id"]) 109 | if existing: 110 | print(f"[SKIP] Brief already exists (ID {row['id']})") 111 | briefs_skipped += 1 112 | continue 113 | 114 | try: 115 | brief = Brief( 116 | id=row["id"], # Preserve original ID 117 | generated_at=datetime.fromisoformat( 118 | row["generated_at"] 119 | ) 120 | if row["generated_at"] 121 | else datetime.now(), 122 | brief_markdown=row["brief_markdown"], 123 | contributing_article_ids=row[ 124 | "contributing_article_ids" 125 | ], 126 | feed_profile=row.get("feed_profile", "default"), 127 | ) 128 | session.add(brief) 129 | briefs_migrated += 1 130 | 131 | except Exception as e: 132 | print(f"[ERROR] Error migrating brief {row['id']}: {e}") 133 | 134 | session.commit() 135 | 136 | print( 137 | f"[DONE] Briefs migration completed: {briefs_migrated} migrated, {briefs_skipped} skipped" 138 | ) 139 | 140 | print(f"\n[SUCCESS] Migration completed successfully!") 141 | print( 142 | f" [INFO] Articles: {articles_migrated} migrated, {articles_skipped} skipped" 143 | ) 144 | print( 145 | f" [INFO] Briefs: {briefs_migrated} migrated, {briefs_skipped} skipped" 146 | ) 147 | 148 | return True 149 | 150 | except Exception as e: 151 | print(f"[ERROR] Error during migration: {e}") 152 | return False 153 | finally: 154 | sqlite_conn.close() 155 | 156 | 157 | def setup_postgresql_fts(): 158 | """ 159 | Set up PostgreSQL full-text search indexes (already handled in models.py) 160 | """ 161 | if "postgresql" not in config.DATABASE_URL.lower(): 162 | print("[WARN] Not using PostgreSQL, skipping FTS setup") 163 | return 164 | 165 | print("[INFO] PostgreSQL full-text search is configured in models.py") 166 | return True 167 | 168 | 169 | def verify_migration(): 170 | """ 171 | Verify the migration by checking record counts 172 | """ 173 | print("\n[INFO] Verifying migration...") 174 | 175 | # Check SQLite counts 176 | sqlite_db_path = config.DATABASE_FILE 177 | if not Path(sqlite_db_path).exists(): 178 | print("[ERROR] Original SQLite database not found for verification") 179 | return 180 | 181 | sqlite_conn = sqlite3.connect(sqlite_db_path) 182 | 183 | # Count SQLite records 184 | sqlite_articles = sqlite_conn.execute( 185 | "SELECT COUNT(*) FROM articles" 186 | ).fetchone()[0] 187 | sqlite_briefs = sqlite_conn.execute( 188 | "SELECT COUNT(*) FROM briefs" 189 | ).fetchone()[0] 190 | sqlite_conn.close() 191 | 192 | # Count new database records 193 | with get_session() as session: 194 | new_articles = len(session.exec(select(Article)).all()) 195 | new_briefs = len(session.exec(select(Brief)).all()) 196 | 197 | print(f"[INFO] Record counts:") 198 | print(f" Articles: SQLite={sqlite_articles}, New DB={new_articles}") 199 | print(f" Briefs: SQLite={sqlite_briefs}, New DB={new_briefs}") 200 | 201 | if sqlite_articles == new_articles and sqlite_briefs == new_briefs: 202 | print("[SUCCESS] Migration verification passed!") 203 | return True 204 | else: 205 | print("[WARN] Record count mismatch - please review migration") 206 | return False 207 | 208 | 209 | if __name__ == "__main__": 210 | import sys 211 | 212 | if len(sys.argv) < 2: 213 | print("Usage: python migrate.py [migrate|verify|setup_fts]") 214 | print(" migrate - Migrate data from SQLite to configured database") 215 | print(" verify - Verify migration by comparing record counts") 216 | print(" setup_fts - Set up full-text search (PostgreSQL)") 217 | sys.exit(1) 218 | 219 | command = sys.argv[1] 220 | 221 | if command == "migrate": 222 | migrate_from_sqlite() 223 | elif command == "verify": 224 | verify_migration() 225 | elif command == "setup_fts": 226 | setup_postgresql_fts() 227 | else: 228 | print( 229 | "[ERROR] Unknown command. Use 'migrate', 'verify', or 'setup_fts'" 230 | ) 231 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Meridiano: Your Personal Intelligence Briefing System 2 | 3 | [![Build Status](https://img.shields.io/badge/build-passing-brightgreen)](#)  4 | [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) 5 | 6 | **AI-powered intelligence briefings, tailored to your interests, built with simple, deployable tech.** 7 | 8 | Meridiano cuts through the news noise by scraping configured sources, analyzing stories with AI (summaries, impact ratings), clustering related events, and delivering concise daily briefs via a web interface. 9 | 10 | Based on the original project <https://github.com/iliane5/meridian> 11 | 12 | https://github.com/user-attachments/assets/2caf1844-5112-4a89-96b5-c018ca308c99 13 | 14 | 15 | ## Why It Exists 16 | 17 | Inspired by the concept of presidential daily briefings, Meridiano aims to provide similar focused intelligence, personalized for individual users. In an era of information overload, it helps you: 18 | 19 | * Stay informed on key global or specific topical events without drowning in noise. 20 | * Understand context beyond headlines through AI analysis. 21 | * Track developing stories via article clustering. 22 | * Leverage AI for summarization and impact assessment. 23 | * Maintain control through customizable feed profiles and open-source code. 24 | 25 | Built for the curious mind wanting depth and relevance without the endless time sink of manual news consumption. 26 | 27 | ## Key Features 28 | 29 | * **Customizable Sources**: Define RSS feed lists via simple Python configuration files (`feeds/`). 30 | * **Multi-Stage Processing**: Modular pipeline (scrape, process, rate, brief) controllable via CLI. 31 | * **AI Analysis**: Uses Deepseek for summarization, impact rating, cluster analysis, and brief synthesis. Needs another AI provider for embeddings. 32 | * **Configurable Prompts**: Tailor LLM prompts for analysis and synthesis per feed profile. 33 | * **Smart Clustering**: Groups related articles using embeddings (via your chosen API) and KMeans. 34 | * **Impact Rating**: AI assigns a 1-10 impact score to articles based on their summary. 35 | * **Image Extraction**: Attempts to fetch representative images from RSS or article OG tags. 36 | * **FTS5 Search**: Fast and relevant full-text search across article titles and content. 37 | * **Web Interface**: Clean Flask-based UI to browse briefings and articles, with filtering (date, profile), sorting, pagination, and search. 38 | * **Simple Tech**: Built with Python, SQLite, and common libraries for easy setup and deployment. 39 | 40 | ## How It Works 41 | 42 | 1. **Configuration**: Load base settings (`config_base.py`) and feed-specific settings (`feeds/<profile_name>.py`), including RSS feeds and custom prompts. 43 | 2. **CLI Control**: `run_briefing.py` orchestrates the stages based on CLI arguments (`--feed`, `--scrape`, `--process`, `--rate`, `--generate`, `--all`). 44 | 3. **Scraping**: Fetches RSS, extracts article content, attempts to find an image (RSS or OG tag), and saves metadata (including `feed_profile`) to the `articles` table. FTS triggers populate `articles_fts`. 45 | 4. **Processing**: Fetches unprocessed articles (per profile), generates summaries (using Deepseek), generates embeddings (using configured provider), and updates the `articles` table. 46 | 5. **Rating**: Fetches unrated articles (per profile), asks Deepseek to rate impact based on summary, and updates the `articles` table. 47 | 6. **Brief Generation**: Fetches recent, processed articles for the specified `feed_profile`, clusters them, analyzes clusters using profile-specific prompts (Deepseek), synthesizes a final brief using profile-specific prompts (Deepseek), and saves it to the `briefs` table. 48 | 7. **Web Interface**: `app.py` (Flask) serves the UI, allowing users to browse briefs and articles, search (FTS), filter (profile, date), sort (date, impact), and paginate results. 49 | 50 | ## Tech Stack 51 | 52 | * **Backend**: Python 3.10+ 53 | * **Database**: SQLite (with FTS5 enabled) 54 | * **Web Framework**: Flask 55 | * **AI APIs**: 56 | * Deepseek API (Summaries, Rating, Analysis, Synthesis) 57 | * Together AI API (Embeddings - or your configured provider) 58 | * **Core Libraries**: 59 | * `feedparser` (RSS handling) 60 | * `requests` (HTTP requests) 61 | * `trafilatura` (Main content extraction) 62 | * `beautifulsoup4` / `lxml` (HTML parsing for OG tags) 63 | * `openai` (Python client for interacting with Deepseek/TogetherAI APIs) 64 | * `scikit-learn`, `numpy` (Clustering) 65 | * `python-dotenv` (Environment variables) 66 | * `argparse` (CLI arguments) 67 | * `markdown` (Rendering content in web UI) 68 | * **Frontend**: HTML, CSS, minimal vanilla JavaScript (for date filter toggle) 69 | 70 | ## Getting Started 71 | 72 | **Prerequisites**: 73 | 74 | * Python 3.10 or later 75 | * Git (optional, for cloning) 76 | * API Keys: 77 | * Deepseek API Key 78 | * Together AI API Key (or key for your chosen embedding provider) 79 | 80 | **Setup**: 81 | 82 | 1. **Clone the repository (or download files):** 83 | ```bash 84 | git clone <your-repo-url> meridiano 85 | cd meridiano 86 | ``` 87 | 88 | 2. **Create and activate a virtual environment:** 89 | ```bash 90 | python -m venv venv 91 | # On macOS/Linux: 92 | source venv/bin/activate 93 | # On Windows: 94 | .\venv\Scripts\activate 95 | ``` 96 | 97 | 3. **Install dependencies:** 98 | ```bash 99 | pip install -r requirements.txt 100 | ``` 101 | 102 | 4. **Configure API Keys:** 103 | * Create a file named `.env` in the project root. 104 | * Add your API keys: 105 | ```dotenv 106 | DEEPSEEK_API_KEY="your_deepseek_api_key_here" 107 | EMBEDDING_API_KEY="your_togetherai_or_other_embedding_api_key_here" 108 | ``` 109 | 110 | 5. **Configure Feeds and Prompts:** 111 | * Review `config_base.py` for default settings and prompts. 112 | * Create a `feeds/` directory in the project root. 113 | * Inside `feeds/`, create profile configuration files (e.g., `default.py`, `tech.py`, `brazil.py`). 114 | * Each `feeds/*.py` file **must** contain an `RSS_FEEDS = [...]` list. 115 | * Optionally, define `PROMPT_CLUSTER_ANALYSIS` or `PROMPT_BRIEF_SYNTHESIS` in a `feeds/*.py` file to override the defaults from `config_base.py` for that specific profile. Define `EMBEDDING_MODEL` if overriding the default. 116 | 117 | 6. **Initialize Database:** 118 | * The database (`meridian.db`) and its schema (including FTS tables) are created automatically the first time you run `run_briefing.py` or `app.py`. 119 | 120 | ## Running the Application 121 | 122 | Meridiano consists of a command-line script (`run_briefing.py`) for data processing and a web server (`app.py`) for viewing results. 123 | 124 | **1. Running Processing Stages (`run_briefing.py`)** 125 | 126 | Use the command line to run different stages for specific feed profiles. 127 | 128 | * **Arguments:** 129 | * `--feed <profile_name>`: Specify the profile to use (e.g., `default`, `tech`, `brazil`). Defaults to `default`. 130 | * `--scrape-articles`: Run only the scraping stage. 131 | * `--process-articles`: Run only the summarization/embedding stage (per profile). 132 | * `--rate-articles`: Run only the impact rating stage (per profile). 133 | * `--generate-brief`: Run only the brief generation stage (per profile). 134 | * `--all`: Run all stages sequentially for the specified profile. 135 | * *(No stage argument)*: Defaults to running all stages (`--all`). 136 | 137 | * **Examples:** 138 | ```bash 139 | # Scrape articles for the 'tech' profile 140 | python run_briefing.py --feed tech --scrape-articles 141 | 142 | # Process and rate articles for the 'default' profile 143 | python run_briefing.py --feed default --process-articles 144 | python run_briefing.py --feed default --rate-articles 145 | 146 | # Generate the brief for the 'brazil' profile 147 | python run_briefing.py --feed brazil --generate-brief 148 | 149 | # Run all stages for the 'tech' profile 150 | python run_briefing.py --feed tech --all 151 | ``` 152 | 153 | * **Scheduling:** For automatic daily runs, use `cron` (Linux/macOS) or Task Scheduler (Windows) to execute the desired `run_briefing.py` command(s) daily. Remember to use the full path to the Python executable within your virtual environment. Example cron job (runs all stages for 'default' profile at 7 AM): 154 | ```cron 155 | 0 7 * * * /path/to/meridiano/venv/bin/python /path/to/meridiano/run_briefing.py --feed default --all >> /path/to/meridiano/meridiano.log 2>&1 156 | ``` 157 | 158 | **2. Running the Web Server (`app.py`)** 159 | 160 | * Start the Flask development server: 161 | ```bash 162 | python app.py 163 | ``` 164 | * Access the web interface in your browser, usually at `http://localhost:5000`. 165 | * For more robust deployment, consider using a production WSGI server like Gunicorn: 166 | ```bash 167 | # pip install gunicorn 168 | gunicorn --bind 0.0.0.0:5000 app:app 169 | ``` 170 | 171 | ## Contributing 172 | 173 | 1. Fork the repository on GitHub. 174 | 2. Create a new branch for your feature or bug fix (git checkout -b feature/your-feature-name). 175 | 3. Make your changes, adhering to the existing code style where possible. 176 | 4. (Optional but Recommended) Add tests for your changes if applicable. 177 | 5. Ensure your changes don't break existing functionality. 178 | 6. Commit your changes (git commit -am 'Add some feature'). 179 | 7. Push to your branch (git push origin feature/your-feature-name). 180 | 8. Create a Pull Request on GitHub, describing your changes clearly. 181 | 182 | ## Credits 183 | 184 | * Original concept and project: https://github.com/iliane5/meridian 185 | * Icon: https://www.svgrepo.com/svg/405007/compass 186 | 187 | ## License 188 | 189 | This project is licensed under the **GNU Affero General Public License v3.0 (AGPLv3)**. 190 | 191 | This license was chosen specifically to ensure that any modifications, derivative works, or network-hosted services based on this code remain open source and freely available to the community under the same terms. 192 | 193 | In short, if you modify and distribute this software, or run a modified version as a network service that users interact with, you **must** make the complete corresponding source code of your version available under the AGPLv3. 194 | 195 | You can find the full license text here: 196 | [https://www.gnu.org/licenses/agpl-3.0.html](https://www.gnu.org/licenses/agpl-3.0.html) 197 | -------------------------------------------------------------------------------- /templates/articles.html: -------------------------------------------------------------------------------- 1 | <!DOCTYPE html> 2 | <html lang="en"> 3 | {% set title = 4 | "Articles" ~ 5 | (current_feed_profile and ("[" ~ current_feed_profile ~ "]") or "") ~ 6 | (current_search_term and (" | Search: " ~ current_search_term) or "") ~ 7 | " (Page " ~ page ~ ")" 8 | %} 9 | {% include 'head.html' %} 10 | 11 | <body> 12 | <div class="container"> 13 | {% include 'header.html' %} 14 | <h2> 15 | Articles 16 | {% if current_feed_profile %}<span class="profile-badge">{{ current_feed_profile }}</span>{% endif %} 17 | </h2> 18 | {% if total_articles > 0 %} 19 | <p> 20 | Showing articles {{ (page - 1) * per_page + 1 }} - {{ [page * per_page, total_articles] | min }} of {{ total_articles }} total (matching filters). 21 | </p> 22 | {% else %} 23 | <p>No articles found matching the current filters.</p> 24 | {% endif %} 25 | <form method="GET" 26 | action="{{ url_for("list_articles") }}" 27 | class="filter-sort-form"> 28 | <input type="hidden" name="sort_by" value="{{ current_sort_by }}"> 29 | <input type="hidden" name="direction" value="{{ current_direction }}"> 30 | <input type="hidden" name="start_date" value="{{ current_start_date }}"> 31 | <input type="hidden" name="end_date" value="{{ current_end_date }}"> 32 | <input type="hidden" name="preset" value="{{ current_preset }}"> 33 | <div class="form-row form-section"> 34 | <div class="profile-filter"> 35 | <label for="feed_profile_select">Profile:</label> 36 | <select name="feed_profile" 37 | id="feed_profile_select" 38 | onchange="this.form.submit()"> 39 | <option value="" {% if not current_feed_profile %}selected{% endif %}>All Profiles</option> 40 | {% for profile in available_profiles %} 41 | <option value="{{ profile }}" 42 | {% if current_feed_profile == profile %}selected{% endif %}> 43 | {{ profile }} 44 | </option> 45 | {% endfor %} 46 | </select> 47 | </div> 48 | <div class="search-filter"> 49 | <input type="search" 50 | name="search" 51 | id="search_box" 52 | placeholder="Search title & content..." 53 | value="{{ current_search_term }}"> 54 | <button type="submit" class="btn btn-search" title="Search"> 55 | <i class="fas fa-search"></i> 56 | </button> 57 | {% if current_search_term %} 58 | <a href="{{ url_for('list_articles', page=1, sort_by=current_sort_by, direction=current_direction, start_date=current_start_date, end_date=current_end_date, preset=current_preset, feed_profile=current_feed_profile) }}" 59 | class="btn btn-clear-search" 60 | title="Clear Search"><i class="fas fa-times"></i></a> 61 | {% endif %} 62 | </div> 63 | </div> 64 | <div class="form-section"> 65 | <button type="button" 66 | class="date-filter-toggle" 67 | onclick="toggleDateFilters()"> 68 | Date Filters <span class="toggle-icon"><i class="fas fa-calendar-alt"></i></span> 69 | </button> 70 | <div id="date-filter-content" 71 | class="date-filter-content" 72 | style="display: none"> 73 | <div class="date-inputs"> 74 | <label for="start_date">From:</label> 75 | <input type="date" 76 | id="start_date" 77 | name="start_date" 78 | value="{{ current_start_date }}"> 79 | <label for="end_date">To:</label> 80 | <input type="date" 81 | id="end_date" 82 | name="end_date" 83 | value="{{ current_end_date }}"> 84 | <button type="submit" class="btn btn-filter"><i class="fas fa-filter"></i> Apply Filters</button> 85 | <a href="{{ url_for('list_articles', sort_by=current_sort_by, direction=current_direction, feed_profile=current_feed_profile, search=current_search_term) }}" 86 | class="btn btn-clear"><i class="fas fa-times-circle"></i> Clear Dates</a> 87 | </div> 88 | <div class="preset-buttons"> 89 | {% set presets = { 90 | "yesterday": "Yesterday", 91 | "last_week": "Last Week", 92 | "last_30d": "Last 30d", 93 | "last_3m": "Last 3mo", 94 | "last_12m": "Last 12mo" 95 | } %} 96 | {% for key, label in presets.items() %} 97 | <a href="{{ url_for('list_articles', preset=key, sort_by=current_sort_by, direction=current_direction, feed_profile=current_feed_profile, search=current_search_term) }}" 98 | class="btn btn-preset {{ 'active' if current_preset == key else '' }}">{{ label }}</a> 99 | {% endfor %} 100 | </div> 101 | </div> 102 | </div> 103 | <div class="sort-controls form-section"> 104 | Sort by: 105 | {% set sort_fields = {"published_date": "Published Date", "impact_score": "Impact Score"} %} 106 | {% for field, label in sort_fields.items() %} 107 | {% set is_active = (current_sort_by == field) %} 108 | {% set next_direction = 'asc' if (is_active and current_direction == 'desc') else 'desc' %} 109 | <a href="{{ url_for('list_articles', page=1, sort_by=field, direction=next_direction, start_date=current_start_date, end_date=current_end_date, preset=current_preset, feed_profile=current_feed_profile, search=current_search_term) }}" 110 | class="sort-link {{ 'active' if is_active else '' }}">{{ label }} 111 | {% if is_active %}<span class="sort-indicator">{{ '▲' if current_direction == 'asc' else '▼' }}</span>{% endif %} 112 | </a> 113 | {% endfor %} 114 | </div> 115 | </form> 116 | <ul class="article-list"> 117 | {% for article in articles %} 118 | <li class="article-item"> 119 | <div class="article-image-container"> 120 | {% if article.image_url %} 121 | <img src="{{ article.image_url }}" 122 | alt="Image for {{ article.title | default("") }}" 123 | class="article-image" 124 | loading="lazy" 125 | onerror="this.style.display='none'; this.nextElementSibling.style.display='flex';"> 126 | {% else %} 127 | <div class="article-image-placeholder"> 128 | <i class="fas fa-image"></i> No Image 129 | </div> 130 | {% endif %} 131 | </div> 132 | <div class="article-text-content"> 133 | <span class="profile-badge list-badge">{{ article.feed_profile }}</span> 134 | {% if article.impact_score is not none %} 135 | <span class="impact-score score-{{ article.impact_score }}" 136 | title="Impact Score: {{ article.impact_score }}/10"> 137 | {{ article.impact_score }} 138 | </span> 139 | {% else %} 140 | <span class="impact-score score-unknown" title="Impact Score: Not Rated">-</span> 141 | {% endif %} 142 | <div class="article-link-wrapper"> 143 | <a href="{{ url_for('view_article', article_id=article['id']) }}" 144 | class="article-link">{{ article['title'] | default("Untitled Article") }}</a> | 145 | <a href="{{ article['url'] }}" class="article-link-original">Original</a> 146 | </div> 147 | <span class="article-meta"> 148 | ({{ article['feed_source'] | default("Unknown Source") }} / {{ article['published_date'] | datetimeformat }}) 149 | </span> 150 | <div class="article-summary-list">{{ article['processed_content_html'] | safe }}</div> 151 | </div> 152 | </li> 153 | {% endfor %} 154 | </ul> 155 | {% if total_pages > 1 %} 156 | <div class="pagination"> 157 | {% if page > 1 %} 158 | <a href="{{ url_for('list_articles', page=page-1, sort_by=current_sort_by, direction=current_direction, start_date=current_start_date, end_date=current_end_date, preset=current_preset, feed_profile=current_feed_profile, search=current_search_term) }}" 159 | class="page-link prev"><i class="fas fa-chevron-left"></i> Previous</a> 160 | {% else %} 161 | <span class="page-link disabled prev"><i class="fas fa-chevron-left"></i> Previous</span> 162 | {% endif %} 163 | <span class="page-info">Page {{ page }} of {{ total_pages }}</span> 164 | {% if page < total_pages %} 165 | <a href="{{ url_for('list_articles', page=page+1, sort_by=current_sort_by, direction=current_direction, start_date=current_start_date, end_date=current_end_date, preset=current_preset, feed_profile=current_feed_profile, search=current_search_term) }}" 166 | class="page-link next">Next <i class="fas fa-chevron-right"></i></a> 167 | {% else %} 168 | <span class="page-link disabled next">Next <i class="fas fa-chevron-right"></i></span> 169 | {% endif %} 170 | </div> 171 | {% endif %} 172 | </div> 173 | <script> 174 | function toggleDateFilters() { 175 | var content = document.getElementById('date-filter-content'); 176 | var icon = document.querySelector('.date-filter-toggle .toggle-icon'); 177 | if (content.style.display === 'none' || content.style.display === '') { 178 | content.style.display = 'block'; 179 | icon.textContent = '-'; 180 | } else { 181 | content.style.display = 'none'; 182 | icon.textContent = '+'; 183 | } 184 | } 185 | document.addEventListener('DOMContentLoaded', function() { 186 | var startDate = document.getElementById('start_date').value; 187 | var endDate = document.getElementById('end_date').value; 188 | if (startDate || endDate) { 189 | toggleDateFilters(); 190 | } 191 | }); 192 | </script> 193 | </body> 194 | </html> 195 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | # simple-meridian/app.py 2 | 3 | from flask import Flask, render_template, request, abort, flash, redirect, url_for 4 | from markupsafe import Markup 5 | import markdown 6 | from datetime import datetime, timedelta, date 7 | import math 8 | import json 9 | import os 10 | 11 | import config_base as config # Use base config for app settings 12 | import database # Import our database functions 13 | from sqlmodel import select 14 | 15 | from utils import scrape_single_article_details, format_datetime 16 | 17 | app = Flask(__name__) 18 | app.secret_key = os.getenv("FLASK_SECRET_KEY", "a_default_secret_key_for_development_only") 19 | 20 | # Register the filter with Jinja 21 | app.jinja_env.filters['datetimeformat'] = format_datetime 22 | 23 | @app.route('/') 24 | def index(): 25 | """Displays a list of briefings, filterable by feed profile.""" 26 | current_feed_profile = request.args.get('feed_profile', '') # Get filter, empty means 'All' 27 | briefs_metadata = database.get_all_briefs_metadata( 28 | feed_profile=current_feed_profile if current_feed_profile else None # Pass None for 'All' 29 | ) 30 | # Get profiles for dropdown 31 | available_profiles = database.get_distinct_feed_profiles(table='briefs') 32 | 33 | return render_template('index.html', 34 | briefs=briefs_metadata, 35 | available_profiles=available_profiles, 36 | current_feed_profile=current_feed_profile) 37 | 38 | @app.route('/brief/<int:brief_id>') 39 | def view_brief(brief_id): 40 | """Displays a single specific briefing.""" 41 | brief_data = database.get_brief_by_id(brief_id) 42 | 43 | if brief_data is None: 44 | abort(404) # Return a 404 error if brief not found 45 | 46 | brief_content_html = Markup(markdown.markdown(brief_data['brief_markdown'], extensions=['fenced_code'])) 47 | generation_time = format_datetime(brief_data['generated_at'], '%Y-%m-%d %H:%M:%S UTC') 48 | 49 | return render_template('view_brief.html', # Use a new template for viewing 50 | brief_id=brief_data['id'], 51 | brief_content=brief_content_html, 52 | generation_time=generation_time) 53 | 54 | @app.route('/articles') 55 | def list_articles(): 56 | """ Displays a paginated list of stored articles with search, sorting and date filtering. """ 57 | # --- Pagination --- 58 | try: page = int(request.args.get('page', 1)) 59 | except ValueError: page = 1 60 | page = max(1, page) 61 | per_page = getattr(config, 'ARTICLES_PER_PAGE', 25) 62 | 63 | # --- Sorting --- 64 | sort_by = request.args.get('sort_by', 'published_date') 65 | direction = request.args.get('direction', 'desc') 66 | if direction not in ['asc', 'desc']: direction = 'desc' 67 | 68 | # --- Date Filtering --- 69 | start_date_str = request.args.get('start_date', '') 70 | end_date_str = request.args.get('end_date', '') 71 | preset = request.args.get('preset', '') 72 | 73 | start_date, end_date = None, None # Initialize date objects 74 | 75 | # Calculate dates based on preset if provided 76 | if preset: 77 | today = date.today() 78 | if preset == 'yesterday': 79 | start_date = today - timedelta(days=1) 80 | end_date = start_date 81 | elif preset == 'last_week': # Last 7 days including today 82 | start_date = today - timedelta(days=6) 83 | end_date = today 84 | elif preset == 'last_30d': 85 | start_date = today - timedelta(days=29) 86 | end_date = today 87 | elif preset == 'last_3m': # Approx 90 days 88 | start_date = today - timedelta(days=89) 89 | end_date = today 90 | elif preset == 'last_12m': # Approx 365 days 91 | start_date = today - timedelta(days=364) 92 | end_date = today 93 | 94 | # Convert calculated dates back to strings for template pre-filling 95 | start_date_str = start_date.isoformat() if start_date else '' 96 | end_date_str = end_date.isoformat() if end_date else '' 97 | 98 | # If no preset, try parsing manual dates 99 | else: 100 | try: 101 | if start_date_str: 102 | start_date = date.fromisoformat(start_date_str) 103 | except ValueError: 104 | start_date_str = '' # Clear invalid date string 105 | start_date = None 106 | print(f"Warning: Invalid start_date format '{request.args.get('start_date')}'") 107 | try: 108 | if end_date_str: 109 | end_date = date.fromisoformat(end_date_str) 110 | except ValueError: 111 | end_date_str = '' # Clear invalid date string 112 | end_date = None 113 | print(f"Warning: Invalid end_date format '{request.args.get('end_date')}'") 114 | 115 | # Feed Profile Filtering 116 | current_feed_profile = request.args.get('feed_profile', '') # Empty means 'All' 117 | 118 | # Search Term Filter 119 | current_search_term = request.args.get('search', '').strip() # Get search term, trim whitespace 120 | 121 | # Fetch total count with ALL filters 122 | total_articles = database.get_total_article_count( 123 | start_date=start_date, end_date=end_date, 124 | feed_profile=current_feed_profile if current_feed_profile else None, 125 | search_term=current_search_term if current_search_term else None 126 | ) 127 | 128 | # Fetch articles with ALL filters and sorting 129 | articles_data = database.get_all_articles( 130 | page=page, per_page=per_page, sort_by=sort_by, direction=direction, 131 | start_date=start_date, end_date=end_date, 132 | feed_profile=current_feed_profile if current_feed_profile else None, 133 | search_term=current_search_term if current_search_term else None 134 | ) 135 | 136 | articles_data = [ 137 | {**article, 'processed_content_html': Markup(markdown.markdown(article['processed_content'] or '', extensions=['fenced_code']))} 138 | for article in articles_data 139 | ] 140 | 141 | # Calculate total pages based on filtered count 142 | if total_articles > 0: total_pages = math.ceil(total_articles / per_page) 143 | else: total_pages = 0 144 | if page > total_pages and total_pages > 0: 145 | # Optional: redirect to last valid page if request goes beyond 146 | args = request.args.copy() 147 | args['page'] = total_pages 148 | # return redirect(url_for('list_articles', **args)) # Redirect approach 149 | page = total_pages # Simpler: just set page to last page 150 | 151 | # Get profiles for dropdown 152 | available_profiles = database.get_distinct_feed_profiles(table='articles') 153 | 154 | return render_template('articles.html', 155 | articles=articles_data, 156 | page=page, total_pages=total_pages, per_page=per_page, 157 | total_articles=total_articles, # Filtered total 158 | current_sort_by=sort_by, current_direction=direction, 159 | current_start_date=start_date_str, current_end_date=end_date_str, 160 | current_preset=preset, 161 | available_profiles=available_profiles, 162 | current_feed_profile=current_feed_profile, 163 | current_search_term=current_search_term) 164 | 165 | @app.route('/article/<int:article_id>') 166 | def view_article(article_id): 167 | """Displays details for a single specific article.""" 168 | article_data_immutable = database.get_article_by_id(article_id) 169 | 170 | if article_data_immutable is None: 171 | abort(404) # Return a 404 error if article not found 172 | 173 | article_data = dict(article_data_immutable) 174 | 175 | summary_markdown = article_data.get('processed_content', '') or '' 176 | # The summary includes a source link, so markdown rendering is useful 177 | article_data['processed_content_html'] = Markup(markdown.markdown(summary_markdown, extensions=['fenced_code'])) 178 | 179 | # Basic check if embedding data exists (without showing the vector) 180 | embedding_status = "Not Generated" 181 | if article_data['embedding']: 182 | try: 183 | # Try loading to see if it's valid JSON and not empty 184 | embed_data = json.loads(article_data['embedding']) 185 | if embed_data: 186 | # You could potentially calculate dimension here if needed: len(embed_data) 187 | embedding_status = "Present" 188 | else: 189 | embedding_status = "Present (Empty)" 190 | except (json.JSONDecodeError, TypeError): 191 | embedding_status = "Present (Invalid Format)" 192 | 193 | 194 | return render_template('view_article.html', # Use a new template 195 | article=article_data, 196 | embedding_status=embedding_status) 197 | 198 | @app.route('/add_article', methods=['GET', 'POST']) 199 | def add_manual_article(): 200 | if request.method == 'POST': 201 | article_url = request.form.get('article_url', '').strip() 202 | feed_profile_to_assign = request.form.get('feed_profile_assign', '').strip() 203 | if not feed_profile_to_assign: 204 | feed_profile_to_assign = getattr(config, 'MANUALLY_ADDED_PROFILE_NAME', 'manual') 205 | 206 | if not article_url: 207 | flash('Article URL is required.', 'error') 208 | return redirect(url_for('add_manual_article')) # Redirect back to form 209 | 210 | if not (article_url.startswith('http://') or article_url.startswith('https://')): # ... 211 | flash('Invalid URL. Please include http:// or https://.', 'error') 212 | return redirect(url_for('add_manual_article')) 213 | 214 | # Check if article already exists (SQLModel session) 215 | with database.get_db_connection() as session: 216 | stmt = select(database.Article).where(database.Article.url == article_url) 217 | existing_article = session.exec(stmt).first() 218 | if existing_article: 219 | flash(f'Article from URL "{article_url}" already exists (ID: {existing_article.id}).', 'warning') 220 | return redirect(url_for('list_articles')) 221 | 222 | # --- Attempt to scrape details immediately --- 223 | scraped_details = scrape_single_article_details(article_url) 224 | 225 | if scraped_details['error'] and not (scraped_details['title'] or scraped_details['raw_content']): 226 | # If major error and nothing was scraped, flash it and retry 227 | flash(f"Error scraping article: {scraped_details['error']}", 'error') 228 | return redirect(url_for('add_manual_article')) 229 | elif scraped_details['error']: 230 | # If minor error (e.g., no image but content scraped), flash as warning but proceed 231 | flash(f"Warning during scraping: {scraped_details['error']}", 'warning') 232 | # --- End immediate scrape --- 233 | 234 | try: 235 | # Use scraped details if available, otherwise fall back to placeholders 236 | final_title = scraped_details['title'] if scraped_details['title'] else "Manually Added - Pending Title" 237 | final_raw_content = scraped_details['raw_content'] # Can be None 238 | final_image_url = scraped_details['image_url'] # Can be None 239 | 240 | article_id = database.add_article( 241 | url=article_url, 242 | title=final_title, 243 | published_date=datetime.now(), # Or try to get from OG tags if scrape_single enhanced 244 | feed_source="Manual Addition", 245 | raw_content=final_raw_content, 246 | feed_profile=feed_profile_to_assign, 247 | image_url=final_image_url 248 | ) 249 | if article_id: 250 | if final_raw_content: 251 | flash(f'Article "{final_title}" added and scraped successfully for profile "{feed_profile_to_assign}" (ID: {article_id}). It will be processed for summary/impact soon.', 'success') 252 | else: # If scraping content failed but we still added the URL 253 | flash(f'Article URL "{article_url}" added to profile "{feed_profile_to_assign}" (ID: {article_id}). Content scraping failed, will be retried by batch process.', 'warning') 254 | return redirect(url_for('view_article', article_id=article_id)) 255 | else: 256 | flash('Failed to add article to database. It might already exist with a different error or there was an unknown issue.', 'error') 257 | except Exception as e: 258 | # logger.error(f"Error adding manual article {article_url} to DB: {e}", exc_info=True) 259 | print(f"ERROR adding manual article {article_url} to DB: {e}") 260 | flash(f'An error occurred while adding the article to the database: {e}', 'error') 261 | 262 | return redirect(url_for('add_manual_article')) 263 | 264 | # GET request: Show the form 265 | # Fetch available profiles for the dropdown 266 | available_profiles = database.get_distinct_feed_profiles(table='articles') 267 | # Add the special manual profile if it's not already a common one 268 | manual_profile_name = getattr(config, 'MANUALLY_ADDED_PROFILE_NAME', 'manual') 269 | if manual_profile_name not in available_profiles: 270 | # Don't permanently add it to the list of *scrapeable* profiles, just offer as option 271 | pass # User will select or it defaults 272 | 273 | return render_template('add_article.html', available_profiles=available_profiles, default_manual_profile=manual_profile_name) 274 | 275 | if __name__ == '__main__': 276 | database.init_db() 277 | app.run(host='0.0.0.0', port=5000, debug=True) 278 | -------------------------------------------------------------------------------- /database.py: -------------------------------------------------------------------------------- 1 | """ 2 | Database operations using SQLModel for the Meridiano application. 3 | This replaces the SQLite-based database.py with modern SQLModel operations. 4 | """ 5 | 6 | import json 7 | import logging 8 | from datetime import datetime, timedelta, date 9 | from typing import Any, Dict, List, Optional 10 | 11 | from sqlalchemy.exc import IntegrityError 12 | from sqlmodel import and_, asc, desc, func, or_, select 13 | from sqlalchemy import text 14 | 15 | import config_base as config 16 | from models import Article, Brief, get_session 17 | 18 | logger = logging.getLogger(__name__) 19 | 20 | ARTICLES_PER_PAGE_DEFAULT = 25 21 | 22 | 23 | def get_db_connection(): 24 | """Returns a new database session (replaces SQLite connection)""" 25 | return get_session() 26 | 27 | 28 | def init_db(): 29 | """Initialize the database - create all tables""" 30 | from models import init_db as model_init_db 31 | 32 | model_init_db() 33 | 34 | 35 | def get_unrated_articles( 36 | feed_profile: str, limit: int = 50 37 | ) -> List[Dict[str, Any]]: 38 | """Gets processed articles that haven't been rated yet.""" 39 | with get_session() as session: 40 | statement = ( 41 | select(Article) 42 | .where( 43 | and_( 44 | Article.processed_content.is_not(None), 45 | Article.processed_content != "", 46 | Article.processed_at.is_not(None), 47 | Article.impact_score.is_(None), 48 | Article.feed_profile == feed_profile, 49 | ) 50 | ) 51 | .order_by(desc(Article.processed_at)) 52 | .limit(limit) 53 | ) 54 | 55 | articles = session.exec(statement).all() 56 | return [_article_to_dict(article) for article in articles] 57 | 58 | 59 | def update_article_rating(article_id: int, impact_score: int) -> None: 60 | """Updates an article with its impact score.""" 61 | with get_session() as session: 62 | statement = select(Article).where(Article.id == article_id) 63 | article = session.exec(statement).first() 64 | if article: 65 | article.impact_score = impact_score 66 | session.add(article) 67 | session.commit() 68 | 69 | 70 | def get_article_by_id(article_id: int) -> Optional[Dict[str, Any]]: 71 | """Retrieves all data for a specific article by its ID.""" 72 | with get_session() as session: 73 | statement = select(Article).where(Article.id == article_id) 74 | article = session.exec(statement).first() 75 | return _article_to_dict(article) if article else None 76 | 77 | 78 | def _article_to_dict(article: Article) -> Dict[str, Any]: 79 | """Convert Article model to dictionary for compatibility with existing code.""" 80 | if not article: 81 | return None 82 | 83 | return article.model_dump( 84 | include={ 85 | "id", 86 | "url", 87 | "title", 88 | "published_date", 89 | "feed_source", 90 | "fetched_at", 91 | "raw_content", 92 | "processed_content", 93 | "embedding", 94 | "processed_at", 95 | "cluster_id", 96 | "impact_score", 97 | "image_url", 98 | "feed_profile", 99 | } 100 | ) 101 | 102 | 103 | def _brief_to_dict(brief: Brief) -> Dict[str, Any]: 104 | """Convert Brief model to dictionary for compatibility with existing code.""" 105 | if not brief: 106 | return None 107 | 108 | return brief.model_dump( 109 | include={ 110 | "id", 111 | "generated_at", 112 | "brief_markdown", 113 | "contributing_article_ids", 114 | "feed_profile", 115 | } 116 | ) 117 | 118 | def _build_article_filters( 119 | start_date: Optional[date] = None, 120 | end_date: Optional[date] = None, 121 | feed_profile: Optional[str] = None, 122 | ): 123 | """Helper for building filter conditions for articles.""" 124 | filters = [] 125 | 126 | if start_date: 127 | filters.append(func.date(Article.published_date) >= func.date(start_date)) 128 | if end_date: 129 | filters.append(func.date(Article.published_date) <= func.date(end_date)) 130 | if feed_profile: 131 | filters.append(Article.feed_profile == feed_profile) 132 | 133 | return filters 134 | 135 | 136 | def get_all_articles( 137 | page: int = 1, 138 | per_page: int = ARTICLES_PER_PAGE_DEFAULT, 139 | sort_by: str = "published_date", 140 | direction: str = "desc", 141 | start_date: Optional[date] = None, 142 | end_date: Optional[date] = None, 143 | feed_profile: Optional[str] = None, 144 | search_term: Optional[str] = None, 145 | ) -> List[Dict[str, Any]]: 146 | """ 147 | Fetches articles with filtering, sorting, and full-text search. 148 | Uses PostgreSQL full-text search when available, falls back to LIKE search. 149 | """ 150 | with get_session() as session: 151 | # Start with base query 152 | statement = select(Article) 153 | 154 | # Apply basic filters 155 | filters = _build_article_filters(start_date, end_date, feed_profile) 156 | if filters: 157 | statement = statement.where(and_(*filters)) 158 | 159 | # Apply search if provided 160 | if search_term: 161 | if "postgresql" in config.DATABASE_URL.lower(): 162 | # PostgreSQL full-text search 163 | search_vector = func.to_tsvector( 164 | "english", 165 | func.coalesce(Article.title, "") 166 | + " " 167 | + func.coalesce(Article.raw_content, ""), 168 | ) 169 | # Use SQLAlchemy's match with a plain string and specify the Postgres 170 | # text search configuration to avoid nesting plainto_tsquery calls. 171 | statement = statement.where( 172 | search_vector.match(search_term, postgresql_regconfig='english') 173 | ) 174 | else: 175 | # Fallback to LIKE search for SQLite 176 | search_filter = or_( 177 | Article.title.ilike(f"%{search_term}%"), 178 | Article.raw_content.ilike(f"%{search_term}%"), 179 | ) 180 | statement = statement.where(search_filter) 181 | 182 | # Apply sorting 183 | sort_columns = { 184 | "published_date": Article.published_date, 185 | "impact_score": Article.impact_score, 186 | "fetched_at": Article.fetched_at, 187 | } 188 | 189 | sort_column = sort_columns.get(sort_by, Article.published_date) 190 | if direction.lower() == "asc": 191 | statement = statement.order_by(asc(sort_column), desc(Article.id)) 192 | else: 193 | statement = statement.order_by(desc(sort_column), desc(Article.id)) 194 | 195 | # Apply pagination 196 | offset = (page - 1) * per_page 197 | statement = statement.offset(offset).limit(per_page) 198 | 199 | articles = session.exec(statement).all() 200 | return [_article_to_dict(article) for article in articles] 201 | 202 | 203 | def get_total_article_count( 204 | start_date: Optional[date] = None, 205 | end_date: Optional[date] = None, 206 | feed_profile: Optional[str] = None, 207 | search_term: Optional[str] = None, 208 | ) -> int: 209 | """Returns total count of articles with optional filtering and search.""" 210 | with get_session() as session: 211 | # Start with base query 212 | statement = select(func.count(Article.id)) 213 | 214 | # Apply basic filters 215 | filters = _build_article_filters(start_date, end_date, feed_profile) 216 | if filters: 217 | statement = statement.where(and_(*filters)) 218 | 219 | # Apply search if provided 220 | if search_term: 221 | if "postgresql" in config.DATABASE_URL.lower(): 222 | # PostgreSQL full-text search 223 | search_vector = func.to_tsvector( 224 | "english", 225 | func.coalesce(Article.title, "") 226 | + " " 227 | + func.coalesce(Article.raw_content, ""), 228 | ) 229 | # Use SQLAlchemy's match with a plain string and specify the Postgres 230 | # text search configuration to avoid nesting plainto_tsquery calls. 231 | statement = statement.where( 232 | search_vector.match(search_term, postgresql_regconfig='english') 233 | ) 234 | else: 235 | # Fallback to LIKE search 236 | search_filter = or_( 237 | Article.title.ilike(f"%{search_term}%"), 238 | Article.raw_content.ilike(f"%{search_term}%"), 239 | ) 240 | statement = statement.where(search_filter) 241 | 242 | return session.exec(statement).one() 243 | 244 | 245 | def add_article( 246 | url: str, 247 | title: str, 248 | published_date: datetime, 249 | feed_source: str, 250 | raw_content: str, 251 | feed_profile: str, 252 | image_url: Optional[str] = None, 253 | ) -> Optional[int]: 254 | """Adds a new article with optional image URL.""" 255 | with get_session() as session: 256 | try: 257 | # Ensure Postgres sequence is in sync to avoid duplicate primary key errors 258 | if "postgresql" in config.DATABASE_URL.lower(): 259 | try: 260 | # Sync the sequence to the current max(id) so nextval() will produce a fresh value. 261 | session.exec( 262 | text( 263 | "SELECT setval(pg_get_serial_sequence('articles','id'), COALESCE((SELECT MAX(id) FROM articles), 1))" 264 | ) 265 | ) 266 | except Exception as e: 267 | # Log warning but continue - this is usually non-critical for new inserts 268 | logger.warning(f"PostgreSQL sequence sync warning (non-critical): {e}") 269 | # Continue - this is usually fine for new inserts 270 | 271 | article = Article( 272 | url=url, 273 | title=title, 274 | published_date=published_date, 275 | feed_source=feed_source, 276 | raw_content=raw_content, 277 | image_url=image_url, 278 | feed_profile=feed_profile, 279 | fetched_at=datetime.now(), 280 | ) 281 | session.add(article) 282 | session.commit() 283 | session.refresh(article) # Get the ID 284 | print(f"Added article [{feed_profile}]: {title}") 285 | return article.id 286 | except IntegrityError: 287 | session.rollback() 288 | return None 289 | 290 | 291 | def get_unprocessed_articles( 292 | feed_profile: str, limit: int = 50 293 | ) -> List[Dict[str, Any]]: 294 | """Gets articles that haven't been processed yet.""" 295 | with get_session() as session: 296 | statement = ( 297 | select(Article) 298 | .where( 299 | and_( 300 | Article.processed_at.is_(None), 301 | Article.raw_content.is_not(None), 302 | Article.raw_content != "", 303 | Article.feed_profile == feed_profile, 304 | ) 305 | ) 306 | .order_by(desc(Article.fetched_at)) 307 | .limit(limit) 308 | ) 309 | 310 | articles = session.exec(statement).all() 311 | return [_article_to_dict(article) for article in articles] 312 | 313 | 314 | def update_article_processing( 315 | article_id: int, processed_content: str, embedding: Optional[List[float]] 316 | ) -> None: 317 | """Updates an article with its summary, embedding, and processed timestamp.""" 318 | with get_session() as session: 319 | statement = select(Article).where(Article.id == article_id) 320 | article = session.exec(statement).first() 321 | if article: 322 | article.processed_content = processed_content 323 | article.embedding = json.dumps(embedding) if embedding else None 324 | article.processed_at = datetime.now() 325 | session.add(article) 326 | session.commit() 327 | 328 | 329 | def get_articles_for_briefing( 330 | lookback_hours: int, feed_profile: str 331 | ) -> List[Dict[str, Any]]: 332 | """Gets recently processed articles for a specific feed profile.""" 333 | cutoff_time = datetime.now() - timedelta(hours=lookback_hours) 334 | 335 | with get_session() as session: 336 | statement = ( 337 | select(Article) 338 | .where( 339 | and_( 340 | Article.processed_at >= cutoff_time, 341 | Article.embedding.is_not(None), 342 | Article.feed_profile == feed_profile, 343 | ) 344 | ) 345 | .order_by(desc(Article.processed_at)) 346 | ) 347 | 348 | articles = session.exec(statement).all() 349 | return [_article_to_dict(article) for article in articles] 350 | 351 | 352 | def save_brief( 353 | brief_markdown: str, contributing_article_ids: List[int], feed_profile: str 354 | ) -> int: 355 | """Saves the generated brief including its feed profile.""" 356 | with get_session() as session: 357 | ids_json = json.dumps(contributing_article_ids) 358 | brief = Brief( 359 | brief_markdown=brief_markdown, 360 | contributing_article_ids=ids_json, 361 | feed_profile=feed_profile, 362 | generated_at=datetime.now(), 363 | ) 364 | 365 | # Guarantee unique and sequential id 366 | if "postgresql" in config.DATABASE_URL.lower(): 367 | try: 368 | session.exec( 369 | text( 370 | "SELECT setval(pg_get_serial_sequence('briefs','id'), COALESCE((SELECT MAX(id) FROM briefs), 1))" 371 | ) 372 | ) 373 | except Exception: 374 | pass 375 | 376 | session.add(brief) 377 | session.commit() 378 | session.refresh(brief) # Get the ID 379 | print(f"Saved brief [{feed_profile}] with ID: {brief.id}") 380 | return brief.id 381 | 382 | 383 | def get_all_briefs_metadata( 384 | feed_profile: Optional[str] = None, 385 | ) -> List[Dict[str, Any]]: 386 | """Retrieves ID, timestamp, and profile for briefs, newest first, optionally filtered.""" 387 | with get_session() as session: 388 | statement = select(Brief) 389 | 390 | if feed_profile: 391 | statement = statement.where(Brief.feed_profile == feed_profile) 392 | 393 | statement = statement.order_by(desc(Brief.generated_at)) 394 | briefs = session.exec(statement).all() 395 | 396 | return [_brief_to_dict(brief) for brief in briefs] 397 | 398 | 399 | def get_brief_by_id(brief_id: int) -> Optional[Dict[str, Any]]: 400 | """Retrieves a specific brief's content and timestamp by its ID.""" 401 | with get_session() as session: 402 | statement = select(Brief).where(Brief.id == brief_id) 403 | brief = session.exec(statement).first() 404 | return _brief_to_dict(brief) if brief else None 405 | 406 | 407 | def get_distinct_feed_profiles(table: str = "articles") -> List[str]: 408 | """Gets a list of distinct feed_profile values from a table.""" 409 | if table not in ["articles", "briefs"]: 410 | raise ValueError("Invalid table name for distinct profiles.") 411 | 412 | with get_session() as session: 413 | if table == "articles": 414 | statement = ( 415 | select(Article.feed_profile) 416 | .distinct() 417 | .order_by(Article.feed_profile) 418 | ) 419 | result = session.exec(statement).all() 420 | else: # table == 'briefs' 421 | statement = ( 422 | select(Brief.feed_profile) 423 | .distinct() 424 | .order_by(Brief.feed_profile) 425 | ) 426 | result = session.exec(statement).all() 427 | 428 | return list(result) 429 | -------------------------------------------------------------------------------- /static/css/style.css: -------------------------------------------------------------------------------- 1 | /* simple-meridian/static/style.css */ 2 | 3 | body { 4 | font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; /* Common system font stack */ 5 | line-height: 1.6; 6 | margin: 0; 7 | padding: 0; 8 | background-color: #f8f9fa; 9 | color: #212529; /* Slightly darker text */ 10 | } 11 | 12 | .container { 13 | max-width: 800px; 14 | margin: 20px auto; 15 | padding: 25px; 16 | background-color: #ffffff; 17 | border-radius: 8px; 18 | box-shadow: 0 2px 5px rgba(0,0,0,0.08); 19 | } 20 | 21 | /* --- Header Styling --- */ 22 | .main-header { 23 | background-color: #ffffff; 24 | border-bottom: 1px solid #dee2e6; 25 | padding: 10px 0; 26 | margin-bottom: 25px; /* Space below header */ 27 | } 28 | 29 | .header-content { 30 | max-width: 800px; 31 | margin: 0 auto; 32 | display: flex; 33 | justify-content: space-between; 34 | align-items: center; 35 | } 36 | 37 | .logo { 38 | font-size: 1.5em; 39 | font-weight: 600; 40 | color: #212529; 41 | text-decoration: none; 42 | } 43 | .logo:hover { 44 | color: #0056b3; 45 | } 46 | 47 | .main-nav { 48 | display: flex; 49 | gap: 25px; 50 | flex-grow: 1; /* Pushes actions to the right */ 51 | margin-left: 40px; 52 | } 53 | 54 | .main-nav a { 55 | text-decoration: none; 56 | color: #495057; 57 | font-size: 1.0em; 58 | padding: 5px 0; 59 | border-bottom: 2px solid transparent; 60 | transition: color 0.2s ease, border-color 0.2s ease; 61 | } 62 | 63 | .main-nav a:hover { 64 | color: #0056b3; 65 | border-bottom-color: #007bff; 66 | } 67 | 68 | .header-actions { 69 | /* Container for buttons or user info */ 70 | } 71 | /* --- End Header Styling --- */ 72 | 73 | 74 | h1 { 75 | color: #343a40; /* Darker heading */ 76 | border-bottom: 2px solid #eee; 77 | padding-bottom: 10px; 78 | margin-top: 0; 79 | font-size: 1.8em; 80 | } 81 | 82 | h2 { 83 | color: #495057; 84 | margin-top: 1.5em; 85 | } 86 | 87 | h3 { 88 | color: #6c757d; 89 | margin-top: 1.2em; 90 | } 91 | 92 | .timestamp { 93 | color: #6c757d; 94 | font-size: 0.9em; 95 | margin-bottom: 15px; 96 | } 97 | 98 | hr { 99 | border: none; 100 | border-top: 1px solid #eee; 101 | margin: 20px 0; 102 | } 103 | 104 | .brief-content { 105 | margin-top: 20px; 106 | } 107 | 108 | /* --- Article List Styling --- */ 109 | .article-list { 110 | list-style-type: none; /* Remove default bullets */ 111 | padding-left: 0; 112 | margin-top: 20px; 113 | } 114 | 115 | .article-item { 116 | margin-bottom: 12px; 117 | padding-bottom: 12px; 118 | border-bottom: 1px solid #f1f1f1; /* Light separator */ 119 | } 120 | .article-item:last-child { 121 | border-bottom: none; 122 | } 123 | 124 | .article-link-wrapper { 125 | display: block; 126 | margin-bottom: 4px; 127 | } 128 | 129 | .article-link, .article-link-original { 130 | display: inline; 131 | text-decoration: none; 132 | color: #0056b3; /* Slightly darker blue for links */ 133 | font-weight: 500; /* Slightly bolder titles */ 134 | } 135 | .article-link:hover, .article-link-original:hover { 136 | text-decoration: underline; 137 | color: #003d80; 138 | } 139 | 140 | .article-meta { 141 | font-size: 0.85em; 142 | color: #6c757d; /* Gray for metadata */ 143 | display: block; /* Put metadata on new line */ 144 | } 145 | /* --- End Article List Styling --- */ 146 | 147 | /* --- Article List Image Styling --- */ 148 | .article-item { 149 | display: flex; /* Use flexbox for layout */ 150 | align-items: flex-start; /* Align items to the top */ 151 | gap: 15px; /* Space between image and text */ 152 | margin-bottom: 15px; 153 | padding-bottom: 15px; 154 | border-bottom: 1px solid #f1f1f1; 155 | } 156 | .article-item:last-child { border-bottom: none; } 157 | 158 | .article-image-container { 159 | flex-shrink: 0; /* Prevent image container from shrinking */ 160 | width: 100px; /* Fixed width for image/placeholder */ 161 | height: 75px; /* Fixed height */ 162 | overflow: hidden; /* Hide parts of image that don't fit */ 163 | border-radius: 4px; 164 | background-color: #f0f0f0; /* BG for container while image loads */ 165 | } 166 | 167 | .article-image { 168 | display: block; 169 | width: 100%; 170 | height: 100%; 171 | object-fit: cover; /* Cover the area, cropping if needed */ 172 | border-radius: 4px; 173 | } 174 | 175 | .article-image-placeholder { 176 | display: flex; 177 | flex-direction: column; 178 | align-items: center; 179 | justify-content: center; 180 | width: 100%; 181 | height: 100%; 182 | background-color: #e9ecef; /* Grey background */ 183 | color: #6c757d; /* Muted text color */ 184 | font-size: 0.8em; 185 | text-align: center; 186 | border-radius: 4px; 187 | box-sizing: border-box; /* Include padding/border in element's total width/height */ 188 | border: 1px dashed #ced4da; /* Optional dashed border */ 189 | } 190 | 191 | .placeholder-icon { 192 | width: 24px; /* Adjust icon size */ 193 | height: 24px; 194 | margin-bottom: 4px; 195 | opacity: 0.6; 196 | } 197 | 198 | .article-text-content { 199 | flex-grow: 1; /* Allow text content to take remaining space */ 200 | /* No specific styles needed unless required */ 201 | } 202 | /* --- End Article List Image Styling --- */ 203 | 204 | 205 | /* --- Article Detail Image Styling --- */ 206 | .article-detail-image-container { 207 | margin-bottom: 25px; /* Space below image */ 208 | max-width: 400px; /* Limit max width on detail page */ 209 | margin-left: auto; /* Center the container */ 210 | margin-right: auto; 211 | } 212 | .article-detail-image { 213 | display: block; 214 | width: 100%; 215 | height: auto; /* Maintain aspect ratio */ 216 | border-radius: 6px; 217 | border: 1px solid #dee2e6; 218 | } 219 | .article-detail-placeholder { 220 | height: 200px; /* Give placeholder a decent height */ 221 | font-size: 1em; 222 | } 223 | .article-detail-placeholder .placeholder-icon { 224 | width: 40px; 225 | height: 40px; 226 | } 227 | 228 | 229 | /* Basic styling for Markdown elements */ 230 | p { 231 | margin-bottom: 1em; 232 | } 233 | 234 | ul:not(.article-list), ol { /* Only apply margin to non-article lists */ 235 | margin-left: 20px; 236 | margin-bottom: 1em; 237 | } 238 | 239 | li:not(.article-item) { /* Only apply margin to non-article list items */ 240 | margin-bottom: 0.5em; 241 | } 242 | 243 | strong, b { 244 | font-weight: bold; 245 | } 246 | 247 | em, i { 248 | font-style: italic; 249 | } 250 | 251 | code { 252 | background-color: #e9ecef; 253 | padding: 0.2em 0.4em; 254 | border-radius: 3px; 255 | font-family: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; 256 | font-size: 0.9em; 257 | } 258 | 259 | pre { 260 | background-color: #e9ecef; 261 | padding: 15px; 262 | border-radius: 5px; 263 | overflow-x: auto; 264 | font-size: 0.9em; 265 | } 266 | 267 | pre code { 268 | background-color: transparent; 269 | padding: 0; 270 | } 271 | 272 | blockquote { 273 | border-left: 4px solid #ccc; 274 | padding-left: 15px; 275 | margin-left: 0; 276 | color: #555; 277 | font-style: italic; 278 | } 279 | 280 | /* --- Pagination Styling --- */ 281 | .pagination { 282 | margin-top: 30px; 283 | padding-top: 15px; 284 | border-top: 1px solid #eee; 285 | text-align: center; 286 | font-size: 0.95em; 287 | } 288 | 289 | .page-link { 290 | display: inline-block; 291 | padding: 8px 15px; 292 | margin: 0 5px; 293 | text-decoration: none; 294 | color: #007bff; 295 | border: 1px solid #dee2e6; 296 | border-radius: 4px; 297 | transition: background-color 0.2s ease, color 0.2s ease; 298 | } 299 | 300 | .page-link:hover { 301 | background-color: #e9ecef; 302 | color: #0056b3; 303 | border-color: #ced4da; 304 | } 305 | 306 | .page-link.disabled { 307 | color: #6c757d; 308 | pointer-events: none; /* Make it non-clickable */ 309 | background-color: #f8f9fa; 310 | border-color: #dee2e6; 311 | } 312 | 313 | .page-info { 314 | display: inline-block; 315 | padding: 8px 15px; 316 | margin: 0 10px; 317 | color: #495057; 318 | font-weight: 500; 319 | } 320 | 321 | /* article page */ 322 | .article-details dt { 323 | font-weight: bold; 324 | margin-top: 1em; 325 | color: #343a40; 326 | } 327 | 328 | /* Style for the collapsible header (dt) */ 329 | .collapsible-header { 330 | display: flex; 331 | justify-content: space-between; /* Puts text left, icon right */ 332 | align-items: center; 333 | } 334 | 335 | /* Style for the icon container */ 336 | .toggle-icon-container { 337 | cursor: pointer; 338 | padding: 2px 5px; 339 | border-radius: 4px; 340 | color: #6c757d; /* Muted icon color */ 341 | display: inline-flex; 342 | align-items: center; 343 | } 344 | .toggle-icon-container:hover { 345 | background-color: #e9ecef; 346 | color: #007bff; /* Highlight on hover */ 347 | } 348 | 349 | /* Style for the SVG icons themselves */ 350 | .toggle-icon-container svg { 351 | width: 16px; 352 | height: 16px; 353 | } 354 | 355 | .article-details dd { 356 | margin-left: 0; /* Reset browser default indent */ 357 | margin-bottom: 0.8em; 358 | color: #495057; 359 | padding-left: 1em; /* Add our own indent */ 360 | border-left: 3px solid #e9ecef; /* Subtle left border */ 361 | } 362 | 363 | .article-details dd a { /* Style link within details */ 364 | color: #0056b3; 365 | text-decoration: none; 366 | } 367 | .article-details dd a:hover { 368 | text-decoration: underline; 369 | } 370 | 371 | .content-block { /* Common style for summary/raw */ 372 | background-color: #f8f9fa; 373 | padding: 10px 15px; 374 | border-radius: 4px; 375 | border: 1px solid #e9ecef; 376 | font-size: 0.95em; 377 | line-height: 1.5; 378 | margin-top: 5px; /* Space from the <dt> */ 379 | } 380 | 381 | .content-block.raw-content pre { 382 | white-space: pre-wrap; /* CSS3 */ 383 | white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ 384 | white-space: -pre-wrap; /* Opera 4-6 */ 385 | white-space: -o-pre-wrap; /* Opera 7 */ 386 | word-wrap: break-word; /* Internet Explorer 5.5+ */ 387 | max-height: 400px; /* Limit height */ 388 | overflow-y: auto; /* Add scrollbar if needed */ 389 | margin: 0; /* Reset default pre margin */ 390 | background-color: transparent; /* Inherit from parent */ 391 | padding: 0; /* Reset default pre padding */ 392 | border: none; /* Reset default pre border */ 393 | font-size: 0.9em; /* Slightly smaller for dense text */ 394 | } 395 | 396 | /* --- Impact Score Styling --- */ 397 | .impact-score { 398 | display: inline-block; /* Changed from inline to inline-block for padding/margin */ 399 | padding: 3px 8px; 400 | font-size: 0.8em; 401 | font-weight: bold; 402 | border-radius: 10px; /* Rounded corners */ 403 | color: #fff; /* Default white text */ 404 | margin-right: 8px; /* Space between badge and title */ 405 | min-width: 1.5em; /* Ensure single digits have decent width */ 406 | text-align: center; 407 | vertical-align: middle; /* Align better with text */ 408 | line-height: 1.2; /* Adjust line height for padding */ 409 | } 410 | 411 | /* Default / Unknown */ 412 | .impact-score.score-unknown { background-color: #adb5bd; color: #fff;} /* Grey */ 413 | 414 | /* Color Scale (Example - adjust colors as desired) */ 415 | .impact-score.score-1, .impact-score.score-2 { background-color: #28a745; } /* Green */ 416 | .impact-score.score-3, .impact-score.score-4 { background-color: #198754; } /* Darker Green */ 417 | .impact-score.score-5, .impact-score.score-6 { background-color: #ffc107; color: #333; } /* Yellow (darker text) */ 418 | .impact-score.score-7, .impact-score.score-8 { background-color: #fd7e14; } /* Orange */ 419 | .impact-score.score-9, .impact-score.score-10 { background-color: #dc3545; } /* Red */ 420 | 421 | /* Adjust positioning slightly on the article list item */ 422 | .article-item .impact-score { 423 | /* Optional: Tweak alignment if needed */ 424 | position: relative; 425 | top: -1px; 426 | } 427 | 428 | .filter-sort-form { 429 | display: flex; 430 | flex-direction: column; /* Stack sections vertically */ 431 | gap: 10px; /* Space between sections */ 432 | margin-bottom: 20px; 433 | padding: 15px; 434 | background-color: #f8f9fa; 435 | border: 1px solid #dee2e6; 436 | border-radius: 5px; 437 | } 438 | 439 | .form-section { 440 | /* Base styles for sections within the form if needed */ 441 | } 442 | 443 | .form-section:last-child { 444 | border-bottom: none; 445 | padding-bottom: 0; 446 | } 447 | 448 | .form-row { 449 | display: flex; 450 | flex-wrap: wrap; /* Allow wrapping */ 451 | gap: 15px; /* Space between profile and search */ 452 | align-items: center; 453 | border-bottom: 1px solid #eee; 454 | padding-bottom: 10px; 455 | } 456 | 457 | /* --- Search Filter Styling --- */ 458 | .search-filter { 459 | display: flex; 460 | align-items: center; 461 | flex-grow: 1; /* Allow search to take more space */ 462 | background-color: #fff; /* White background */ 463 | border: 1px solid #ced4da; 464 | border-radius: 20px; /* Rounded corners */ 465 | padding: 0px 5px 0px 12px; /* Padding left for text, less right for button */ 466 | box-shadow: 0 1px 3px rgba(0,0,0,0.04); 467 | } 468 | 469 | .search-filter input[type="search"] { 470 | border: none; 471 | outline: none; 472 | padding: 8px 5px; 473 | font-size: 0.95em; 474 | flex-grow: 1; /* Input takes available space */ 475 | background: transparent; 476 | min-width: 150px; /* Minimum width */ 477 | } 478 | /* Hide browser's default cancel button */ 479 | .search-filter input[type="search"]::-webkit-search-cancel-button, 480 | .search-filter input[type="search"]::-webkit-search-decoration { 481 | -webkit-appearance: none; 482 | appearance: none; 483 | } 484 | 485 | 486 | .btn-search { 487 | background: none; 488 | border: none; 489 | padding: 5px 8px; 490 | cursor: pointer; 491 | color: #6c757d; 492 | } 493 | .btn-search:hover { 494 | color: #007bff; 495 | } 496 | .btn-search svg { /* Style embedded SVG */ 497 | display: block; /* Remove extra space below SVG */ 498 | width: 18px; 499 | height: 18px; 500 | } 501 | 502 | .btn-clear-search { 503 | background: none; 504 | border: none; 505 | color: #adb5bd; 506 | font-size: 1.4em; 507 | line-height: 1; 508 | padding: 0 8px; 509 | cursor: pointer; 510 | text-decoration: none; 511 | margin-left: 4px; /* Space from search button */ 512 | } 513 | .btn-clear-search:hover { 514 | color: #dc3545; /* Red on hover */ 515 | } 516 | 517 | /* --- Collapsible Date Filter Styling --- */ 518 | .date-filter-toggle { 519 | background-color: transparent; 520 | border: none; 521 | color: #007bff; 522 | cursor: pointer; 523 | font-size: 0.9em; 524 | padding: 5px 0; 525 | width: 100%; 526 | text-align: left; 527 | margin-bottom: 10px; 528 | } 529 | .date-filter-toggle:hover { 530 | text-decoration: underline; 531 | } 532 | .toggle-icon { 533 | display: inline-block; 534 | margin-left: 5px; 535 | font-weight: bold; 536 | width: 1em; /* Fixed width */ 537 | text-align: center; 538 | } 539 | 540 | .date-filter-content { 541 | /* display: none; */ /* Set inline style initially */ 542 | padding: 15px; 543 | border: 1px solid #e9ecef; 544 | border-radius: 4px; 545 | background-color: #fff; /* White background inside */ 546 | margin-top: -10px; /* Pull up slightly */ 547 | display: flex; /* Use flex for layout inside */ 548 | flex-direction: column; 549 | gap: 15px; /* Space between date inputs and presets */ 550 | } 551 | 552 | /* Ensure date inputs and presets display nicely within collapsible */ 553 | .date-filter-content .date-inputs, 554 | .date-filter-content .preset-buttons { 555 | display: flex; 556 | flex-wrap: wrap; 557 | align-items: center; 558 | gap: 8px; 559 | } 560 | 561 | .filter-controls { 562 | display: flex; 563 | flex-wrap: wrap; /* Allow wrapping on smaller screens */ 564 | gap: 15px; /* Space between sections */ 565 | align-items: center; 566 | } 567 | 568 | .date-inputs { 569 | display: flex; 570 | align-items: center; 571 | gap: 8px; /* Space between label, input, button */ 572 | flex-grow: 1; /* Allow date inputs to take available space */ 573 | } 574 | 575 | .date-inputs label { 576 | font-size: 0.9em; 577 | color: #495057; 578 | } 579 | 580 | .date-inputs input[type="date"] { 581 | padding: 5px 8px; 582 | border: 1px solid #ced4da; 583 | border-radius: 4px; 584 | font-size: 0.9em; 585 | } 586 | 587 | .preset-buttons { 588 | display: flex; 589 | gap: 8px; 590 | flex-wrap: wrap; /* Allow buttons to wrap */ 591 | } 592 | 593 | /* General Button Styling */ 594 | .btn { 595 | display: inline-block; 596 | padding: 6px 12px; 597 | font-size: 0.9em; 598 | font-weight: 400; 599 | line-height: 1.5; 600 | text-align: center; 601 | text-decoration: none; 602 | vertical-align: middle; 603 | cursor: pointer; 604 | user-select: none; 605 | border: 1px solid transparent; 606 | border-radius: 4px; 607 | transition: color .15s ease-in-out,background-color .15s ease-in-out,border-color .15s ease-in-out,box-shadow .15s ease-in-out; 608 | margin: 0.5em 0 0.5em 0; 609 | } 610 | 611 | .btn-filter { 612 | color: #fff; 613 | background-color: #007bff; 614 | border-color: #007bff; 615 | } 616 | .btn-filter:hover { 617 | background-color: #0056b3; 618 | border-color: #0056b3; 619 | } 620 | 621 | .btn-clear { 622 | color: #6c757d; 623 | background-color: transparent; 624 | border: 1px solid #6c757d; 625 | } 626 | .btn-clear:hover { 627 | color: #fff; 628 | background-color: #6c757d; 629 | } 630 | 631 | .btn-preset { 632 | color: #007bff; 633 | background-color: #e7f1ff; /* Light blue background */ 634 | border: 1px solid #b8d6ff; 635 | } 636 | .btn-preset:hover { 637 | background-color: #cde2ff; 638 | border-color: #99c3ff; 639 | } 640 | .btn-preset.active { 641 | color: #fff; 642 | background-color: #007bff; 643 | border-color: #007bff; 644 | font-weight: bold; 645 | } 646 | 647 | /* --- End Filter Controls Styling --- */ 648 | 649 | /* --- Sorting Controls Styling (adjust if needed) --- */ 650 | .sort-controls { 651 | margin-bottom: 20px; 652 | padding-bottom: 10px; 653 | font-size: 0.9em; 654 | color: #6c757d; 655 | border-top: 1px solid #eee; 656 | padding-top: 10px; 657 | } 658 | 659 | /* --- Sorting Controls Styling --- */ 660 | 661 | .sort-link { 662 | text-decoration: none; 663 | color: #007bff; /* Link blue */ 664 | margin-left: 15px; 665 | padding: 3px 5px; 666 | border-radius: 3px; 667 | } 668 | 669 | .sort-link:hover { 670 | background-color: #e9ecef; 671 | text-decoration: none; 672 | } 673 | 674 | .sort-link.active { 675 | font-weight: bold; 676 | color: #0056b3; /* Darker blue for active */ 677 | background-color: #d6eaff; /* Light blue background for active */ 678 | } 679 | 680 | .sort-indicator { 681 | display: inline-block; 682 | margin-left: 4px; 683 | font-size: 0.8em; /* Make indicator slightly smaller */ 684 | color: #343a40; /* Dark color for indicator */ 685 | } 686 | 687 | /* --- Feed Profile Filter & Badge Styling --- */ 688 | .profile-filter { 689 | margin-bottom: 15px; 690 | padding-bottom: 0; 691 | font-size: 0.9em; 692 | color: #6c757d; 693 | } 694 | 695 | .profile-filter.form-section select { /* Style the select dropdown */ 696 | padding: 5px 8px; 697 | border: 1px solid #ced4da; 698 | border-radius: 4px; 699 | font-size: 0.9em; 700 | margin-left: 5px; 701 | } 702 | .profile-filter.form-section label { 703 | font-size: 0.9em; 704 | color: #495057; 705 | } 706 | 707 | .profile-link { 708 | text-decoration: none; 709 | color: #007bff; 710 | margin-left: 10px; 711 | padding: 3px 6px; 712 | border-radius: 3px; 713 | } 714 | .profile-link:hover { 715 | background-color: #e9ecef; 716 | text-decoration: none; 717 | } 718 | .profile-link.active { 719 | font-weight: bold; 720 | color: #fff; 721 | background-color: #6c757d; /* Grey background for active profile */ 722 | } 723 | 724 | .profile-badge { 725 | display: inline-block; 726 | padding: 2px 7px; 727 | font-size: 0.8em; 728 | font-weight: 500; 729 | border-radius: 4px; 730 | background-color: #6c757d; /* Default grey */ 731 | color: #fff; 732 | margin-left: 8px; 733 | vertical-align: middle; 734 | } 735 | .profile-badge.list-badge { /* Smaller badge on list items */ 736 | font-size: 0.7em; 737 | padding: 1px 5px; 738 | margin-right: 5px; /* Space before impact score */ 739 | margin-left: 0; 740 | } 741 | 742 | .relevance-notice { 743 | font-weight: bold; 744 | color: #17a2b8; /* Teal color or similar */ 745 | font-size: 0.9em; 746 | } 747 | 748 | .sort-link-disabled { 749 | color: #adb5bd; /* Lighter grey for disabled */ 750 | text-decoration: none; 751 | margin-left: 5px; 752 | cursor: default; 753 | } 754 | 755 | /* --- Flash Message Styling --- */ 756 | .flash-messages { 757 | margin-bottom: 15px; 758 | } 759 | .flash { 760 | padding: 10px 15px; 761 | margin-bottom: 10px; 762 | border: 1px solid transparent; 763 | border-radius: 4px; 764 | font-size: 0.9em; 765 | } 766 | .flash-success { color: #0f5132; background-color: #d1e7dd; border-color: #badbcc; } 767 | .flash-error { color: #842029; background-color: #f8d7da; border-color: #f5c2c7; } 768 | .flash-warning { color: #664d03; background-color: #fff3cd; border-color: #ffecb5; } 769 | .flash-info { color: #055160; background-color: #cff4fc; border-color: #b6effb; } /* For general info */ 770 | 771 | /* --- Add Article Form Styling --- */ 772 | .add-article-form { 773 | margin-top: 20px; 774 | padding: 20px; 775 | background-color: #f8f9fa; 776 | border: 1px solid #dee2e6; 777 | border-radius: 5px; 778 | } 779 | .form-group { 780 | margin-bottom: 1rem; 781 | } 782 | .form-group label { 783 | display: block; 784 | margin-bottom: .5rem; 785 | font-weight: 500; 786 | font-size: 0.9em; 787 | } 788 | .form-control { 789 | display: block; 790 | width: 100%; 791 | padding: .375rem .75rem; 792 | font-size: 1rem; 793 | font-weight: 400; 794 | line-height: 1.5; 795 | color: #212529; 796 | background-color: #fff; 797 | background-clip: padding-box; 798 | border: 1px solid #ced4da; 799 | appearance: none; /* For select */ 800 | border-radius: .25rem; 801 | transition: border-color .15s ease-in-out,box-shadow .15s ease-in-out; 802 | box-sizing: border-box; /* Include padding and border in the element's total width and height */ 803 | } 804 | .form-control:focus { 805 | border-color: #86b7fe; 806 | outline: 0; 807 | box-shadow: 0 0 0 .25rem rgba(13,110,253,.25); 808 | } 809 | .form-text { 810 | display: block; 811 | margin-top: .25rem; 812 | font-size: .8em; 813 | color: #6c757d; 814 | } 815 | .btn-primary { /* For the submit button */ 816 | color: #fff; 817 | background-color: #0d6efd; 818 | border-color: #0d6efd; 819 | } 820 | .btn-primary:hover { 821 | color: #fff; 822 | background-color: #0b5ed7; 823 | border-color: #0a58ca; 824 | } 825 | 826 | /* --- Header Action Button Styling --- */ 827 | .btn-add-article { 828 | color: #198754; /* Green text */ 829 | background-color: #e8f3ed; /* Light green background */ 830 | border: 1px solid #a3d0b9; 831 | font-weight: 500; 832 | display: inline-flex; 833 | align-items: center; 834 | gap: 6px; /* Space between icon and text */ 835 | margin: 0; /* Override any margin from .btn */ 836 | } 837 | 838 | .btn-add-article:visited { 839 | color: #198754; 840 | } 841 | 842 | .btn-add-article:hover { 843 | color: #fff; 844 | background-color: #198754; /* Solid green on hover */ 845 | border-color: #198754; 846 | } 847 | -------------------------------------------------------------------------------- /run_briefing.py: -------------------------------------------------------------------------------- 1 | # simple-meridian/run_briefing.py 2 | 3 | import os 4 | import importlib 5 | import feedparser 6 | from datetime import datetime 7 | import json 8 | import time 9 | import re 10 | import numpy as np 11 | from sklearn.cluster import KMeans 12 | from dotenv import load_dotenv 13 | import openai 14 | import argparse 15 | 16 | from urllib.parse import urljoin 17 | 18 | from utils import fetch_article_content_and_og_image 19 | 20 | try: 21 | import config_base as config # Load base config first 22 | except ImportError: 23 | print("ERROR: config_base.py not found. Please ensure it exists.") 24 | exit(1) 25 | 26 | import database 27 | from models import Article, get_session 28 | from sqlmodel import select 29 | 30 | # --- Setup --- 31 | load_dotenv() 32 | API_KEY = os.getenv("DEEPSEEK_API_KEY") 33 | EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY") 34 | 35 | if not API_KEY: 36 | raise ValueError("DEEPSEEK_API_KEY not found in .env file") 37 | 38 | if not EMBEDDING_API_KEY: 39 | raise ValueError("EMBEDDING_API_KEY not found in .env file") 40 | 41 | # Use the correct client for Deepseek, not OpenAI 42 | client = openai.Client(api_key=API_KEY, base_url="https://api.deepseek.com/v1") 43 | embedding_client = openai.Client(api_key=EMBEDDING_API_KEY, base_url="https://api.together.xyz/v1") 44 | 45 | def call_deepseek_chat(prompt, model=config.DEEPSEEK_CHAT_MODEL, system_prompt=None): 46 | """Calls the Deepseek Chat API.""" 47 | messages = [] 48 | if system_prompt: 49 | messages.append({"role": "system", "content": system_prompt}) 50 | messages.append({"role": "user", "content": prompt}) 51 | 52 | try: 53 | response = client.chat.completions.create( 54 | model=model, 55 | messages=messages, 56 | max_tokens=2048, # Adjust as needed 57 | temperature=0.7, # Adjust for desired creativity/factuality 58 | ) 59 | return response.choices[0].message.content.strip() 60 | except Exception as e: 61 | print(f"Error calling Deepseek Chat API: {e}") 62 | # Implement retry logic or better error handling here if needed 63 | time.sleep(1) # Basic backoff 64 | return None 65 | 66 | def get_deepseek_embedding(text, model=config.EMBEDDING_MODEL): 67 | """Gets embeddings.""" 68 | print(f"INFO: Attempting to get embedding for text snippet: '{text[:50]}...'") 69 | 70 | try: 71 | response = embedding_client.embeddings.create( 72 | model=model, # Use the actual model name from Deepseek docs 73 | input=[text] # API likely expects a list of strings 74 | ) 75 | # Access the embedding vector based on the actual API response structure 76 | if response.data and len(response.data) > 0: 77 | embedding = response.data[0].embedding 78 | # Validate embedding is not empty and has valid structure 79 | if embedding and len(embedding) > 0: 80 | return embedding 81 | else: 82 | print(f"Warning: Empty or invalid embedding returned for text.") 83 | return None 84 | else: 85 | print(f"Warning: No embedding data in API response.") 86 | return None 87 | except Exception as e: 88 | print(f"Error calling Embedding API: {e}") 89 | return None 90 | 91 | # --- Core Functions --- 92 | 93 | def scrape_articles(feed_profile, rss_feeds): # Added params 94 | """Scrapes articles for a specific feed profile.""" 95 | print(f"\n--- Starting Article Scraping [{feed_profile}] ---") 96 | new_articles_count = 0 97 | if not rss_feeds: 98 | print(f"Warning: No RSS_FEEDS defined for profile '{feed_profile}'. Skipping scrape.") 99 | return 100 | 101 | for feed_url in rss_feeds: 102 | print(f"Fetching feed: {feed_url}") 103 | feed = feedparser.parse(feed_url) 104 | 105 | if feed.bozo: print(f"Warning: Potential issue parsing feed {feed_url}: {feed.bozo_exception}") 106 | 107 | for entry in feed.entries: 108 | url = entry.get('link') 109 | title = entry.get('title', 'No Title') 110 | published_parsed = entry.get('published_parsed') 111 | published_date = datetime(*published_parsed[:6]) if published_parsed else datetime.now() 112 | feed_source = feed.feed.get('title', feed_url) 113 | 114 | if not url: continue 115 | 116 | # --- Check if article exists --- 117 | with get_session() as session: 118 | exists = session.exec(select(Article).where(Article.url == url)).first() 119 | if exists: continue 120 | # --- End Check --- 121 | 122 | print(f"Processing new entry: {title} ({url})") 123 | 124 | # --- 1. Try getting image from RSS feed --- 125 | rss_image_url = None 126 | # Check enclosures 127 | if 'enclosures' in entry: 128 | for enc in entry.enclosures: 129 | if enc.get('type', '').startswith('image/'): 130 | rss_image_url = enc.get('href') 131 | break # Take the first image enclosure 132 | # Check media_content if no enclosure image found 133 | if not rss_image_url and 'media_content' in entry: 134 | for media in entry.media_content: 135 | if media.get('medium') == 'image' and media.get('url'): 136 | rss_image_url = media.get('url') 137 | break # Take the first media image 138 | elif media.get('type', '').startswith('image/') and media.get('url'): 139 | rss_image_url = media.get('url') 140 | break 141 | # Check simple image tag (less common) 142 | if not rss_image_url and 'image' in entry and isinstance(entry.image, dict) and entry.image.get('url'): 143 | rss_image_url = entry.image.get('url') 144 | 145 | if rss_image_url: 146 | print(f" Found image in RSS: {rss_image_url[:60]}...") 147 | # --- End RSS Image Check --- 148 | 149 | # --- 2. Fetch Article Content & OG Image --- 150 | print(f" Fetching article content and OG image...") 151 | fetch_result = fetch_article_content_and_og_image(url) 152 | raw_content = fetch_result['content'] 153 | og_image_url = fetch_result['og_image'] 154 | # --- End Fetch --- 155 | 156 | if not raw_content: 157 | print(f" Skipping article, failed to extract main content: {title}") 158 | continue 159 | 160 | # --- 3. Determine Final Image URL and Save --- 161 | final_image_url = rss_image_url if rss_image_url else og_image_url 162 | if final_image_url: 163 | print(f" Using image URL: {final_image_url[:60]}...") 164 | else: 165 | print(" No image found in RSS or OG tags.") 166 | 167 | article_id = database.add_article( 168 | url, title, published_date, feed_source, raw_content, 169 | feed_profile, 170 | final_image_url 171 | ) 172 | if article_id: new_articles_count += 1 173 | time.sleep(0.5) # Be polite 174 | 175 | print(f"--- Scraping Finished [{feed_profile}]. Added {new_articles_count} new articles. ---") 176 | 177 | 178 | def process_articles(feed_profile, effective_config): 179 | """Processes unprocessed articles: summarizes and generates embeddings.""" 180 | print("\n--- Starting Article Processing ---") 181 | chat_model = getattr(effective_config, 'DEEPSEEK_CHAT_MODEL', 'deepseek-chat') # Get model from effective config 182 | summary_prompt_template = getattr(effective_config, 'PROMPT_ARTICLE_SUMMARY', config.PROMPT_ARTICLE_SUMMARY) 183 | 184 | unprocessed = database.get_unprocessed_articles(feed_profile, 1000) 185 | processed_count = 0 186 | if not unprocessed: 187 | print("No new articles to process.") 188 | return 189 | 190 | print(f"Found {len(unprocessed)} articles to process.") 191 | for article in unprocessed: 192 | print(f"Processing article ID: {article['id']} - {article['url'][:50]}...") 193 | 194 | # 1. Summarize using Deepseek Chat 195 | # Format the potentially profile-specific summary prompt 196 | summary_prompt = summary_prompt_template.format( 197 | article_content=article['raw_content'][:4000] # Limit context 198 | ) 199 | summary = call_deepseek_chat(summary_prompt, model=chat_model) 200 | 201 | if not summary: 202 | print(f"Skipping article {article['id']} due to summarization error.") 203 | continue 204 | 205 | print(f"Article summary is: {summary}") 206 | 207 | # 2. Generate Embedding using Deepseek (or alternative) 208 | # Use summary for embedding to focus on core topics and save tokens/time 209 | embedding = get_deepseek_embedding(summary) 210 | 211 | if not embedding: 212 | print(f"Skipping article {article['id']} due to embedding error.") 213 | continue # Or store article without embedding if desired 214 | 215 | # 3. Update Database 216 | database.update_article_processing(article['id'], summary, embedding) 217 | processed_count += 1 218 | print(f"Successfully processed article ID: {article['id']}") 219 | time.sleep(1) # Avoid hitting API rate limits 220 | 221 | print(f"--- Processing Finished. Processed {processed_count} articles. ---") 222 | 223 | def rate_articles(feed_profile, effective_config): 224 | """Rates the impact of processed articles using an LLM.""" 225 | print("\n--- Starting Article Impact Rating ---") 226 | if not client: 227 | print("Skipping rating: Deepseek client not initialized.") 228 | return 229 | 230 | chat_model = getattr(effective_config, 'DEEPSEEK_CHAT_MODEL', 'deepseek-chat') 231 | rating_prompt_template = getattr(effective_config, 'PROMPT_IMPACT_RATING', config.PROMPT_IMPACT_RATING) 232 | 233 | unrated = database.get_unrated_articles(feed_profile, 1000) 234 | rated_count = 0 235 | if not unrated: 236 | print("No new articles to rate.") 237 | return 238 | 239 | print(f"Found {len(unrated)} processed articles to rate.") 240 | for article in unrated: 241 | print(f"Rating article ID: {article['id']}: {article['title']}...") 242 | summary = article['processed_content'] 243 | if not summary: 244 | print(f" Skipping article {article['id']} - no summary found.") 245 | continue 246 | 247 | # Format the potentially profile-specific rating prompt 248 | rating_prompt = rating_prompt_template.format( 249 | summary=summary 250 | ) 251 | rating_response = call_deepseek_chat(rating_prompt, model=chat_model) 252 | 253 | impact_score = None 254 | if rating_response: 255 | try: 256 | # Extract first integer (1-10) from response using regex 257 | # This handles multi-line responses and text around the number 258 | match = re.search(r'\b([1-9]|10)\b', rating_response.strip()) 259 | if match: 260 | score = int(match.group(1)) 261 | if 1 <= score <= 10: 262 | impact_score = score 263 | print(f" Article ID {article['id']} rated as: {impact_score}") 264 | else: 265 | print(f" Warning: Rating response '{rating_response}' for article {article['id']} is out of range (1-10).") 266 | else: 267 | print(f" Warning: Could not find valid rating (1-10) in response '{rating_response}' for article {article['id']}.") 268 | except (ValueError, AttributeError) as e: 269 | print(f" Warning: Could not parse integer rating from response '{rating_response}' for article {article['id']}: {e}") 270 | else: 271 | print(f" Warning: No rating response received for article {article['id']}.") 272 | 273 | # Update database even if rating failed (impact_score will be None, prevents re-attempting failed ones immediately) 274 | # Or only update if impact_score is not None: 275 | if impact_score is not None: 276 | database.update_article_rating(article['id'], impact_score) 277 | rated_count += 1 278 | # else: # Decide if you want to mark failed attempts differently 279 | # database.update_article_rating(article['id'], -1) # Example: Mark as failed with -1? Or leave NULL? Leaving NULL for now. 280 | 281 | time.sleep(1) # API rate limiting 282 | 283 | print(f"--- Rating Finished. Rated {rated_count} articles. ---") 284 | 285 | 286 | def generate_brief(feed_profile, effective_config): # Added feed_profile param 287 | """Generates the briefing for a specific feed profile.""" 288 | print(f"\n--- Starting Brief Generation [{feed_profile}] ---") 289 | # Get articles *for this specific profile* 290 | articles = database.get_articles_for_briefing( 291 | config.BRIEFING_ARTICLE_LOOKBACK_HOURS, 292 | feed_profile 293 | ) 294 | 295 | if not articles or len(articles) < config.MIN_ARTICLES_FOR_BRIEFING: 296 | print(f"Not enough recent articles ({len(articles)}) for profile '{feed_profile}'. Min required: {config.MIN_ARTICLES_FOR_BRIEFING}.") 297 | return 298 | 299 | print(f"Generating brief from {len(articles)} articles.") 300 | 301 | # Prepare data for clustering 302 | article_ids = [a['id'] for a in articles] 303 | summaries = [a['processed_content'] for a in articles] 304 | embeddings = [json.loads(a['embedding']) for a in articles if a['embedding']] # Load JSON string 305 | 306 | if len(embeddings) != len(articles): 307 | print("Warning: Some articles selected for briefing are missing embeddings. Proceeding with available ones.") 308 | # Filter articles, summaries, ids to match embeddings 309 | valid_indices = [i for i, a in enumerate(articles) if a['embedding']] 310 | articles = [articles[i] for i in valid_indices] 311 | article_ids = [article_ids[i] for i in valid_indices] 312 | summaries = [summaries[i] for i in valid_indices] 313 | # embeddings are already filtered 314 | 315 | if len(embeddings) < config.MIN_ARTICLES_FOR_BRIEFING: 316 | print(f"Not enough articles ({len(embeddings)}) with embeddings to cluster. Min required: {config.MIN_ARTICLES_FOR_BRIEFING}.") 317 | return 318 | 319 | embedding_matrix = np.array(embeddings) 320 | 321 | # Clustering (using KMeans as an example) 322 | n_clusters = min(config.N_CLUSTERS, len(embedding_matrix) // 2) # Ensure clusters < samples/2 323 | if n_clusters < 2 : # Need at least 2 clusters for KMeans typically 324 | print("Not enough articles to form meaningful clusters. Skipping clustering.") 325 | # Alternative: Treat all articles as one cluster or generate simple list summary 326 | # For now, we'll just exit brief generation 327 | return 328 | 329 | print(f"Clustering {len(embedding_matrix)} articles into {n_clusters} clusters...") 330 | try: 331 | kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) # n_init='auto' in newer sklearn 332 | kmeans.fit(embedding_matrix) 333 | labels = kmeans.labels_ 334 | except Exception as e: 335 | print(f"Error during clustering: {e}") 336 | return 337 | 338 | # Analyze each cluster 339 | cluster_analyses = [] 340 | print("Analyzing clusters...") 341 | 342 | # *** Get the cluster analysis prompt template from effective_config *** 343 | cluster_analysis_prompt_template = getattr( 344 | effective_config, 345 | 'PROMPT_CLUSTER_ANALYSIS', # Look for this constant 346 | config.PROMPT_CLUSTER_ANALYSIS # Fallback to default if not found 347 | ) 348 | print(f"DEBUG: Using Cluster Analysis Prompt Template:\n'''{cluster_analysis_prompt_template[:100]}...'''") # Debug 349 | 350 | for i in range(n_clusters): # Use the actual n_clusters determined 351 | cluster_indices = np.where(labels == i)[0] 352 | if len(cluster_indices) == 0: continue # Skip empty clusters 353 | 354 | cluster_summaries = [summaries[idx] for idx in cluster_indices] 355 | print(f" Analyzing Cluster {i} ({len(cluster_summaries)} articles)") 356 | 357 | MAX_SUMMARIES_PER_CLUSTER = 10 # Consider making this configurable too? 358 | cluster_summaries_text = "\n\n".join([f"- {s}" for s in cluster_summaries[:MAX_SUMMARIES_PER_CLUSTER]]) 359 | 360 | # *** Format the chosen prompt template *** 361 | analysis_prompt = cluster_analysis_prompt_template.format( 362 | cluster_summaries_text=cluster_summaries_text, 363 | feed_profile=feed_profile 364 | ) 365 | 366 | # *** Call LLM with the formatted prompt *** 367 | cluster_analysis = call_deepseek_chat(analysis_prompt) # System prompt could also be configurable 368 | 369 | if cluster_analysis: 370 | # (Consider adding more robust filtering of non-analysis responses) 371 | if "unrelated" not in cluster_analysis.lower() or len(cluster_summaries) > 2: 372 | cluster_analyses.append({"topic": f"Cluster {i+1}", "analysis": cluster_analysis, "size": len(cluster_summaries)}) 373 | time.sleep(1) # Rate limiting 374 | # --- End Analyze each cluster --- 375 | 376 | if not cluster_analyses: 377 | print("No meaningful clusters found or analyzed.") 378 | return 379 | 380 | # Sort clusters by size (number of articles) to prioritize major themes 381 | cluster_analyses.sort(key=lambda x: x['size'], reverse=True) 382 | 383 | # Synthesize Final Brief using profile-specific or default prompt 384 | brief_synthesis_prompt_template = getattr(effective_config, 'PROMPT_BRIEF_SYNTHESIS', config.PROMPT_BRIEF_SYNTHESIS) # Fallback 385 | print(f"DEBUG: Using Brief Synthesis Prompt Template:\n'''{brief_synthesis_prompt_template[:100]}...'''") # Debug print 386 | 387 | cluster_analyses_text = "" 388 | for i, cluster in enumerate(cluster_analyses[:5]): 389 | cluster_analyses_text += f"--- Cluster {i+1} ({cluster['size']} articles) ---\nAnalysis: {cluster['analysis']}\n\n" 390 | 391 | synthesis_prompt = brief_synthesis_prompt_template.format( 392 | cluster_analyses_text=cluster_analyses_text, 393 | feed_profile=feed_profile 394 | ) 395 | final_brief_md = call_deepseek_chat(synthesis_prompt) 396 | 397 | if final_brief_md: 398 | database.save_brief(final_brief_md, article_ids, feed_profile) 399 | print(f"--- Brief Generation Finished Successfully [{feed_profile}] ---") 400 | else: 401 | print(f"--- Brief Generation Failed [{feed_profile}]: Could not synthesize final brief. ---") 402 | 403 | # --- Main Execution --- 404 | if __name__ == "__main__": 405 | parser = argparse.ArgumentParser( 406 | description="Meridian Briefing Runner: Scrapes, processes, and generates briefings.", 407 | formatter_class=argparse.RawTextHelpFormatter # Nicer help text formatting 408 | ) 409 | parser.add_argument( 410 | '--feed', 411 | type=str, 412 | default=config.DEFAULT_FEED_PROFILE, # Use default from base config 413 | help=f"Specify the feed profile name (e.g., brazil, tech). Default: '{config.DEFAULT_FEED_PROFILE}'." 414 | ) 415 | parser.add_argument( 416 | '--rate-articles', 417 | dest='rate', 418 | action='store_true', 419 | help='Run only the article impact rating stage (requires processed articles).' 420 | ) 421 | parser.add_argument( 422 | '--scrape-articles', 423 | dest='scrape', 424 | action='store_true', 425 | help='Run only the article scraping stage.' 426 | ) 427 | parser.add_argument( 428 | '--process-articles', 429 | dest='process', 430 | action='store_true', 431 | help='Run only the article processing (summarize, embed) stage.' 432 | ) 433 | parser.add_argument( 434 | '--generate-brief', 435 | dest='generate', 436 | action='store_true', 437 | help='Run only the brief generation (cluster, analyze, synthesize) stage.' 438 | ) 439 | parser.add_argument( 440 | '--all', 441 | dest='run_all', 442 | action='store_true', 443 | help='Run all stages sequentially (scrape, process, generate).\nThis is the default behavior if no specific stage argument is given.' 444 | ) 445 | 446 | args = parser.parse_args() 447 | 448 | # --- Load Feed Specific Config --- 449 | feed_profile_name = args.feed 450 | feed_module_name = f"feeds.{feed_profile_name}" 451 | try: 452 | feed_config = importlib.import_module(feed_module_name) 453 | print(f"Loaded feed configuration: {feed_module_name}") 454 | # Optionally merge settings if feed configs override base config values 455 | # For now, we just need RSS_FEEDS from it 456 | rss_feeds = getattr(feed_config, 'RSS_FEEDS', []) 457 | if not rss_feeds: 458 | print(f"Warning: RSS_FEEDS list not found or empty in {feed_module_name}.py") 459 | except ImportError: 460 | print(f"ERROR: Could not import feed configuration '{feed_module_name}.py'.") 461 | print(f"Please ensure the file exists and contains an RSS_FEEDS list.") 462 | # Decide how to handle: exit or continue without scraping/generation? 463 | # Let's allow processing/rating to run, but disable scrape/generate 464 | rss_feeds = None # Indicate feed load failure 465 | 466 | # --- Create Effective Config --- 467 | # Start with base config vars 468 | effective_config_dict = {k: v for k, v in config.__dict__.items() if not k.startswith('__')} 469 | # Override with feed_config vars if they exist 470 | if feed_config: 471 | for k, v in feed_config.__dict__.items(): 472 | if not k.startswith('__'): 473 | effective_config_dict[k] = v 474 | 475 | # Convert dict to a simple object for easier access (optional) 476 | class EffectiveConfig: 477 | def __init__(self, dictionary): 478 | for k, v in dictionary.items(): 479 | setattr(self, k, v) 480 | effective_config = EffectiveConfig(effective_config_dict) 481 | 482 | # Ensure RSS_FEEDS is correctly set in the effective config if loaded 483 | if rss_feeds is not None: 484 | effective_config.RSS_FEEDS = rss_feeds 485 | 486 | # Default to running all if no specific stage OR --all is provided 487 | should_run_all = args.run_all or not (args.scrape or args.process or args.generate or args.rate) 488 | 489 | print(f"\nMeridian Briefing Run [{feed_profile_name}] - {datetime.now()}") 490 | print("Initializing database...") 491 | database.init_db() # Initialize DB regardless of stage run 492 | 493 | current_rss_feeds = getattr(effective_config, 'RSS_FEEDS', None) 494 | 495 | if should_run_all: 496 | print("\n>>> Running ALL stages <<<") 497 | if current_rss_feeds: scrape_articles(feed_profile_name, current_rss_feeds) 498 | else: print("Skipping scrape stage: No RSS_FEEDS found for profile.") 499 | process_articles(feed_profile_name, effective_config) 500 | rate_articles(feed_profile_name, effective_config) 501 | if current_rss_feeds: generate_brief(feed_profile_name, effective_config) 502 | else: print("Skipping generate stage: No RSS_FEEDS found for profile.") 503 | else: 504 | if args.scrape: 505 | if current_rss_feeds: 506 | print(f"\n>>> Running ONLY Scrape Articles stage [{feed_profile_name}] <<<") 507 | scrape_articles(feed_profile_name, current_rss_feeds) 508 | else: print(f"Cannot run scrape stage: No RSS_FEEDS found for profile '{feed_profile_name}'.") 509 | if args.process: 510 | print("\n>>> Running ONLY Process Articles stage <<<") 511 | process_articles(feed_profile_name, effective_config) 512 | if args.rate: 513 | print("\n>>> Running ONLY Rate Articles stage <<<") 514 | rate_articles(feed_profile_name, effective_config) 515 | if args.generate: 516 | if current_rss_feeds: # Check if feeds exist, as brief relies on articles from them 517 | print(f"\n>>> Running ONLY Generate Brief stage [{feed_profile_name}] <<<") 518 | generate_brief(feed_profile_name, effective_config) 519 | else: print(f"Cannot run generate stage: No RSS_FEEDS found for profile '{feed_profile_name}'.") 520 | 521 | print(f"\nRun Finished [{feed_profile_name}] - {datetime.now()}") 522 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU AFFERO GENERAL PUBLIC LICENSE 2 | Version 3, 19 November 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/> 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU Affero General Public License is a free, copyleft license for 11 | software and other kinds of works, specifically designed to ensure 12 | cooperation with the community in the case of network server software. 13 | 14 | The licenses for most software and other practical works are designed 15 | to take away your freedom to share and change the works. By contrast, 16 | our General Public Licenses are intended to guarantee your freedom to 17 | share and change all versions of a program--to make sure it remains free 18 | software for all its users. 19 | 20 | When we speak of free software, we are referring to freedom, not 21 | price. Our General Public Licenses are designed to make sure that you 22 | have the freedom to distribute copies of free software (and charge for 23 | them if you wish), that you receive source code or can get it if you 24 | want it, that you can change the software or use pieces of it in new 25 | free programs, and that you know you can do these things. 26 | 27 | Developers that use our General Public Licenses protect your rights 28 | with two steps: (1) assert copyright on the software, and (2) offer 29 | you this License which gives you legal permission to copy, distribute 30 | and/or modify the software. 31 | 32 | A secondary benefit of defending all users' freedom is that 33 | improvements made in alternate versions of the program, if they 34 | receive widespread use, become available for other developers to 35 | incorporate. Many developers of free software are heartened and 36 | encouraged by the resulting cooperation. However, in the case of 37 | software used on network servers, this result may fail to come about. 38 | The GNU General Public License permits making a modified version and 39 | letting the public access it on a server without ever releasing its 40 | source code to the public. 41 | 42 | The GNU Affero General Public License is designed specifically to 43 | ensure that, in such cases, the modified source code becomes available 44 | to the community. It requires the operator of a network server to 45 | provide the source code of the modified version running there to the 46 | users of that server. Therefore, public use of a modified version, on 47 | a publicly accessible server, gives the public access to the source 48 | code of the modified version. 49 | 50 | An older license, called the Affero General Public License and 51 | published by Affero, was designed to accomplish similar goals. This is 52 | a different license, not a version of the Affero GPL, but Affero has 53 | released a new version of the Affero GPL which permits relicensing under 54 | this license. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | TERMS AND CONDITIONS 60 | 61 | 0. Definitions. 62 | 63 | "This License" refers to version 3 of the GNU Affero General Public License. 64 | 65 | "Copyright" also means copyright-like laws that apply to other kinds of 66 | works, such as semiconductor masks. 67 | 68 | "The Program" refers to any copyrightable work licensed under this 69 | License. Each licensee is addressed as "you". "Licensees" and 70 | "recipients" may be individuals or organizations. 71 | 72 | To "modify" a work means to copy from or adapt all or part of the work 73 | in a fashion requiring copyright permission, other than the making of an 74 | exact copy. The resulting work is called a "modified version" of the 75 | earlier work or a work "based on" the earlier work. 76 | 77 | A "covered work" means either the unmodified Program or a work based 78 | on the Program. 79 | 80 | To "propagate" a work means to do anything with it that, without 81 | permission, would make you directly or secondarily liable for 82 | infringement under applicable copyright law, except executing it on a 83 | computer or modifying a private copy. Propagation includes copying, 84 | distribution (with or without modification), making available to the 85 | public, and in some countries other activities as well. 86 | 87 | To "convey" a work means any kind of propagation that enables other 88 | parties to make or receive copies. Mere interaction with a user through 89 | a computer network, with no transfer of a copy, is not conveying. 90 | 91 | An interactive user interface displays "Appropriate Legal Notices" 92 | to the extent that it includes a convenient and prominently visible 93 | feature that (1) displays an appropriate copyright notice, and (2) 94 | tells the user that there is no warranty for the work (except to the 95 | extent that warranties are provided), that licensees may convey the 96 | work under this License, and how to view a copy of this License. If 97 | the interface presents a list of user commands or options, such as a 98 | menu, a prominent item in the list meets this criterion. 99 | 100 | 1. Source Code. 101 | 102 | The "source code" for a work means the preferred form of the work 103 | for making modifications to it. "Object code" means any non-source 104 | form of a work. 105 | 106 | A "Standard Interface" means an interface that either is an official 107 | standard defined by a recognized standards body, or, in the case of 108 | interfaces specified for a particular programming language, one that 109 | is widely used among developers working in that language. 110 | 111 | The "System Libraries" of an executable work include anything, other 112 | than the work as a whole, that (a) is included in the normal form of 113 | packaging a Major Component, but which is not part of that Major 114 | Component, and (b) serves only to enable use of the work with that 115 | Major Component, or to implement a Standard Interface for which an 116 | implementation is available to the public in source code form. A 117 | "Major Component", in this context, means a major essential component 118 | (kernel, window system, and so on) of the specific operating system 119 | (if any) on which the executable work runs, or a compiler used to 120 | produce the work, or an object code interpreter used to run it. 121 | 122 | The "Corresponding Source" for a work in object code form means all 123 | the source code needed to generate, install, and (for an executable 124 | work) run the object code and to modify the work, including scripts to 125 | control those activities. However, it does not include the work's 126 | System Libraries, or general-purpose tools or generally available free 127 | programs which are used unmodified in performing those activities but 128 | which are not part of the work. For example, Corresponding Source 129 | includes interface definition files associated with source files for 130 | the work, and the source code for shared libraries and dynamically 131 | linked subprograms that the work is specifically designed to require, 132 | such as by intimate data communication or control flow between those 133 | subprograms and other parts of the work. 134 | 135 | The Corresponding Source need not include anything that users 136 | can regenerate automatically from other parts of the Corresponding 137 | Source. 138 | 139 | The Corresponding Source for a work in source code form is that 140 | same work. 141 | 142 | 2. Basic Permissions. 143 | 144 | All rights granted under this License are granted for the term of 145 | copyright on the Program, and are irrevocable provided the stated 146 | conditions are met. This License explicitly affirms your unlimited 147 | permission to run the unmodified Program. The output from running a 148 | covered work is covered by this License only if the output, given its 149 | content, constitutes a covered work. This License acknowledges your 150 | rights of fair use or other equivalent, as provided by copyright law. 151 | 152 | You may make, run and propagate covered works that you do not 153 | convey, without conditions so long as your license otherwise remains 154 | in force. You may convey covered works to others for the sole purpose 155 | of having them make modifications exclusively for you, or provide you 156 | with facilities for running those works, provided that you comply with 157 | the terms of this License in conveying all material for which you do 158 | not control copyright. Those thus making or running the covered works 159 | for you must do so exclusively on your behalf, under your direction 160 | and control, on terms that prohibit them from making any copies of 161 | your copyrighted material outside their relationship with you. 162 | 163 | Conveying under any other circumstances is permitted solely under 164 | the conditions stated below. Sublicensing is not allowed; section 10 165 | makes it unnecessary. 166 | 167 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 168 | 169 | No covered work shall be deemed part of an effective technological 170 | measure under any applicable law fulfilling obligations under article 171 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 172 | similar laws prohibiting or restricting circumvention of such 173 | measures. 174 | 175 | When you convey a covered work, you waive any legal power to forbid 176 | circumvention of technological measures to the extent such circumvention 177 | is effected by exercising rights under this License with respect to 178 | the covered work, and you disclaim any intention to limit operation or 179 | modification of the work as a means of enforcing, against the work's 180 | users, your or third parties' legal rights to forbid circumvention of 181 | technological measures. 182 | 183 | 4. Conveying Verbatim Copies. 184 | 185 | You may convey verbatim copies of the Program's source code as you 186 | receive it, in any medium, provided that you conspicuously and 187 | appropriately publish on each copy an appropriate copyright notice; 188 | keep intact all notices stating that this License and any 189 | non-permissive terms added in accord with section 7 apply to the code; 190 | keep intact all notices of the absence of any warranty; and give all 191 | recipients a copy of this License along with the Program. 192 | 193 | You may charge any price or no price for each copy that you convey, 194 | and you may offer support or warranty protection for a fee. 195 | 196 | 5. Conveying Modified Source Versions. 197 | 198 | You may convey a work based on the Program, or the modifications to 199 | produce it from the Program, in the form of source code under the 200 | terms of section 4, provided that you also meet all of these conditions: 201 | 202 | a) The work must carry prominent notices stating that you modified 203 | it, and giving a relevant date. 204 | 205 | b) The work must carry prominent notices stating that it is 206 | released under this License and any conditions added under section 207 | 7. This requirement modifies the requirement in section 4 to 208 | "keep intact all notices". 209 | 210 | c) You must license the entire work, as a whole, under this 211 | License to anyone who comes into possession of a copy. This 212 | License will therefore apply, along with any applicable section 7 213 | additional terms, to the whole of the work, and all its parts, 214 | regardless of how they are packaged. This License gives no 215 | permission to license the work in any other way, but it does not 216 | invalidate such permission if you have separately received it. 217 | 218 | d) If the work has interactive user interfaces, each must display 219 | Appropriate Legal Notices; however, if the Program has interactive 220 | interfaces that do not display Appropriate Legal Notices, your 221 | work need not make them do so. 222 | 223 | A compilation of a covered work with other separate and independent 224 | works, which are not by their nature extensions of the covered work, 225 | and which are not combined with it such as to form a larger program, 226 | in or on a volume of a storage or distribution medium, is called an 227 | "aggregate" if the compilation and its resulting copyright are not 228 | used to limit the access or legal rights of the compilation's users 229 | beyond what the individual works permit. Inclusion of a covered work 230 | in an aggregate does not cause this License to apply to the other 231 | parts of the aggregate. 232 | 233 | 6. Conveying Non-Source Forms. 234 | 235 | You may convey a covered work in object code form under the terms 236 | of sections 4 and 5, provided that you also convey the 237 | machine-readable Corresponding Source under the terms of this License, 238 | in one of these ways: 239 | 240 | a) Convey the object code in, or embodied in, a physical product 241 | (including a physical distribution medium), accompanied by the 242 | Corresponding Source fixed on a durable physical medium 243 | customarily used for software interchange. 244 | 245 | b) Convey the object code in, or embodied in, a physical product 246 | (including a physical distribution medium), accompanied by a 247 | written offer, valid for at least three years and valid for as 248 | long as you offer spare parts or customer support for that product 249 | model, to give anyone who possesses the object code either (1) a 250 | copy of the Corresponding Source for all the software in the 251 | product that is covered by this License, on a durable physical 252 | medium customarily used for software interchange, for a price no 253 | more than your reasonable cost of physically performing this 254 | conveying of source, or (2) access to copy the 255 | Corresponding Source from a network server at no charge. 256 | 257 | c) Convey individual copies of the object code with a copy of the 258 | written offer to provide the Corresponding Source. This 259 | alternative is allowed only occasionally and noncommercially, and 260 | only if you received the object code with such an offer, in accord 261 | with subsection 6b. 262 | 263 | d) Convey the object code by offering access from a designated 264 | place (gratis or for a charge), and offer equivalent access to the 265 | Corresponding Source in the same way through the same place at no 266 | further charge. You need not require recipients to copy the 267 | Corresponding Source along with the object code. If the place to 268 | copy the object code is a network server, the Corresponding Source 269 | may be on a different server (operated by you or a third party) 270 | that supports equivalent copying facilities, provided you maintain 271 | clear directions next to the object code saying where to find the 272 | Corresponding Source. Regardless of what server hosts the 273 | Corresponding Source, you remain obligated to ensure that it is 274 | available for as long as needed to satisfy these requirements. 275 | 276 | e) Convey the object code using peer-to-peer transmission, provided 277 | you inform other peers where the object code and Corresponding 278 | Source of the work are being offered to the general public at no 279 | charge under subsection 6d. 280 | 281 | A separable portion of the object code, whose source code is excluded 282 | from the Corresponding Source as a System Library, need not be 283 | included in conveying the object code work. 284 | 285 | A "User Product" is either (1) a "consumer product", which means any 286 | tangible personal property which is normally used for personal, family, 287 | or household purposes, or (2) anything designed or sold for incorporation 288 | into a dwelling. In determining whether a product is a consumer product, 289 | doubtful cases shall be resolved in favor of coverage. For a particular 290 | product received by a particular user, "normally used" refers to a 291 | typical or common use of that class of product, regardless of the status 292 | of the particular user or of the way in which the particular user 293 | actually uses, or expects or is expected to use, the product. A product 294 | is a consumer product regardless of whether the product has substantial 295 | commercial, industrial or non-consumer uses, unless such uses represent 296 | the only significant mode of use of the product. 297 | 298 | "Installation Information" for a User Product means any methods, 299 | procedures, authorization keys, or other information required to install 300 | and execute modified versions of a covered work in that User Product from 301 | a modified version of its Corresponding Source. The information must 302 | suffice to ensure that the continued functioning of the modified object 303 | code is in no case prevented or interfered with solely because 304 | modification has been made. 305 | 306 | If you convey an object code work under this section in, or with, or 307 | specifically for use in, a User Product, and the conveying occurs as 308 | part of a transaction in which the right of possession and use of the 309 | User Product is transferred to the recipient in perpetuity or for a 310 | fixed term (regardless of how the transaction is characterized), the 311 | Corresponding Source conveyed under this section must be accompanied 312 | by the Installation Information. But this requirement does not apply 313 | if neither you nor any third party retains the ability to install 314 | modified object code on the User Product (for example, the work has 315 | been installed in ROM). 316 | 317 | The requirement to provide Installation Information does not include a 318 | requirement to continue to provide support service, warranty, or updates 319 | for a work that has been modified or installed by the recipient, or for 320 | the User Product in which it has been modified or installed. Access to a 321 | network may be denied when the modification itself materially and 322 | adversely affects the operation of the network or violates the rules and 323 | protocols for communication across the network. 324 | 325 | Corresponding Source conveyed, and Installation Information provided, 326 | in accord with this section must be in a format that is publicly 327 | documented (and with an implementation available to the public in 328 | source code form), and must require no special password or key for 329 | unpacking, reading or copying. 330 | 331 | 7. Additional Terms. 332 | 333 | "Additional permissions" are terms that supplement the terms of this 334 | License by making exceptions from one or more of its conditions. 335 | Additional permissions that are applicable to the entire Program shall 336 | be treated as though they were included in this License, to the extent 337 | that they are valid under applicable law. If additional permissions 338 | apply only to part of the Program, that part may be used separately 339 | under those permissions, but the entire Program remains governed by 340 | this License without regard to the additional permissions. 341 | 342 | When you convey a copy of a covered work, you may at your option 343 | remove any additional permissions from that copy, or from any part of 344 | it. (Additional permissions may be written to require their own 345 | removal in certain cases when you modify the work.) You may place 346 | additional permissions on material, added by you to a covered work, 347 | for which you have or can give appropriate copyright permission. 348 | 349 | Notwithstanding any other provision of this License, for material you 350 | add to a covered work, you may (if authorized by the copyright holders of 351 | that material) supplement the terms of this License with terms: 352 | 353 | a) Disclaiming warranty or limiting liability differently from the 354 | terms of sections 15 and 16 of this License; or 355 | 356 | b) Requiring preservation of specified reasonable legal notices or 357 | author attributions in that material or in the Appropriate Legal 358 | Notices displayed by works containing it; or 359 | 360 | c) Prohibiting misrepresentation of the origin of that material, or 361 | requiring that modified versions of such material be marked in 362 | reasonable ways as different from the original version; or 363 | 364 | d) Limiting the use for publicity purposes of names of licensors or 365 | authors of the material; or 366 | 367 | e) Declining to grant rights under trademark law for use of some 368 | trade names, trademarks, or service marks; or 369 | 370 | f) Requiring indemnification of licensors and authors of that 371 | material by anyone who conveys the material (or modified versions of 372 | it) with contractual assumptions of liability to the recipient, for 373 | any liability that these contractual assumptions directly impose on 374 | those licensors and authors. 375 | 376 | All other non-permissive additional terms are considered "further 377 | restrictions" within the meaning of section 10. If the Program as you 378 | received it, or any part of it, contains a notice stating that it is 379 | governed by this License along with a term that is a further 380 | restriction, you may remove that term. If a license document contains 381 | a further restriction but permits relicensing or conveying under this 382 | License, you may add to a covered work material governed by the terms 383 | of that license document, provided that the further restriction does 384 | not survive such relicensing or conveying. 385 | 386 | If you add terms to a covered work in accord with this section, you 387 | must place, in the relevant source files, a statement of the 388 | additional terms that apply to those files, or a notice indicating 389 | where to find the applicable terms. 390 | 391 | Additional terms, permissive or non-permissive, may be stated in the 392 | form of a separately written license, or stated as exceptions; 393 | the above requirements apply either way. 394 | 395 | 8. Termination. 396 | 397 | You may not propagate or modify a covered work except as expressly 398 | provided under this License. Any attempt otherwise to propagate or 399 | modify it is void, and will automatically terminate your rights under 400 | this License (including any patent licenses granted under the third 401 | paragraph of section 11). 402 | 403 | However, if you cease all violation of this License, then your 404 | license from a particular copyright holder is reinstated (a) 405 | provisionally, unless and until the copyright holder explicitly and 406 | finally terminates your license, and (b) permanently, if the copyright 407 | holder fails to notify you of the violation by some reasonable means 408 | prior to 60 days after the cessation. 409 | 410 | Moreover, your license from a particular copyright holder is 411 | reinstated permanently if the copyright holder notifies you of the 412 | violation by some reasonable means, this is the first time you have 413 | received notice of violation of this License (for any work) from that 414 | copyright holder, and you cure the violation prior to 30 days after 415 | your receipt of the notice. 416 | 417 | Termination of your rights under this section does not terminate the 418 | licenses of parties who have received copies or rights from you under 419 | this License. If your rights have been terminated and not permanently 420 | reinstated, you do not qualify to receive new licenses for the same 421 | material under section 10. 422 | 423 | 9. Acceptance Not Required for Having Copies. 424 | 425 | You are not required to accept this License in order to receive or 426 | run a copy of the Program. Ancillary propagation of a covered work 427 | occurring solely as a consequence of using peer-to-peer transmission 428 | to receive a copy likewise does not require acceptance. However, 429 | nothing other than this License grants you permission to propagate or 430 | modify any covered work. These actions infringe copyright if you do 431 | not accept this License. Therefore, by modifying or propagating a 432 | covered work, you indicate your acceptance of this License to do so. 433 | 434 | 10. Automatic Licensing of Downstream Recipients. 435 | 436 | Each time you convey a covered work, the recipient automatically 437 | receives a license from the original licensors, to run, modify and 438 | propagate that work, subject to this License. You are not responsible 439 | for enforcing compliance by third parties with this License. 440 | 441 | An "entity transaction" is a transaction transferring control of an 442 | organization, or substantially all assets of one, or subdividing an 443 | organization, or merging organizations. If propagation of a covered 444 | work results from an entity transaction, each party to that 445 | transaction who receives a copy of the work also receives whatever 446 | licenses to the work the party's predecessor in interest had or could 447 | give under the previous paragraph, plus a right to possession of the 448 | Corresponding Source of the work from the predecessor in interest, if 449 | the predecessor has it or can get it with reasonable efforts. 450 | 451 | You may not impose any further restrictions on the exercise of the 452 | rights granted or affirmed under this License. For example, you may 453 | not impose a license fee, royalty, or other charge for exercise of 454 | rights granted under this License, and you may not initiate litigation 455 | (including a cross-claim or counterclaim in a lawsuit) alleging that 456 | any patent claim is infringed by making, using, selling, offering for 457 | sale, or importing the Program or any portion of it. 458 | 459 | 11. Patents. 460 | 461 | A "contributor" is a copyright holder who authorizes use under this 462 | License of the Program or a work on which the Program is based. The 463 | work thus licensed is called the contributor's "contributor version". 464 | 465 | A contributor's "essential patent claims" are all patent claims 466 | owned or controlled by the contributor, whether already acquired or 467 | hereafter acquired, that would be infringed by some manner, permitted 468 | by this License, of making, using, or selling its contributor version, 469 | but do not include claims that would be infringed only as a 470 | consequence of further modification of the contributor version. For 471 | purposes of this definition, "control" includes the right to grant 472 | patent sublicenses in a manner consistent with the requirements of 473 | this License. 474 | 475 | Each contributor grants you a non-exclusive, worldwide, royalty-free 476 | patent license under the contributor's essential patent claims, to 477 | make, use, sell, offer for sale, import and otherwise run, modify and 478 | propagate the contents of its contributor version. 479 | 480 | In the following three paragraphs, a "patent license" is any express 481 | agreement or commitment, however denominated, not to enforce a patent 482 | (such as an express permission to practice a patent or covenant not to 483 | sue for patent infringement). To "grant" such a patent license to a 484 | party means to make such an agreement or commitment not to enforce a 485 | patent against the party. 486 | 487 | If you convey a covered work, knowingly relying on a patent license, 488 | and the Corresponding Source of the work is not available for anyone 489 | to copy, free of charge and under the terms of this License, through a 490 | publicly available network server or other readily accessible means, 491 | then you must either (1) cause the Corresponding Source to be so 492 | available, or (2) arrange to deprive yourself of the benefit of the 493 | patent license for this particular work, or (3) arrange, in a manner 494 | consistent with the requirements of this License, to extend the patent 495 | license to downstream recipients. "Knowingly relying" means you have 496 | actual knowledge that, but for the patent license, your conveying the 497 | covered work in a country, or your recipient's use of the covered work 498 | in a country, would infringe one or more identifiable patents in that 499 | country that you have reason to believe are valid. 500 | 501 | If, pursuant to or in connection with a single transaction or 502 | arrangement, you convey, or propagate by procuring conveyance of, a 503 | covered work, and grant a patent license to some of the parties 504 | receiving the covered work authorizing them to use, propagate, modify 505 | or convey a specific copy of the covered work, then the patent license 506 | you grant is automatically extended to all recipients of the covered 507 | work and works based on it. 508 | 509 | A patent license is "discriminatory" if it does not include within 510 | the scope of its coverage, prohibits the exercise of, or is 511 | conditioned on the non-exercise of one or more of the rights that are 512 | specifically granted under this License. You may not convey a covered 513 | work if you are a party to an arrangement with a third party that is 514 | in the business of distributing software, under which you make payment 515 | to the third party based on the extent of your activity of conveying 516 | the work, and under which the third party grants, to any of the 517 | parties who would receive the covered work from you, a discriminatory 518 | patent license (a) in connection with copies of the covered work 519 | conveyed by you (or copies made from those copies), or (b) primarily 520 | for and in connection with specific products or compilations that 521 | contain the covered work, unless you entered into that arrangement, 522 | or that patent license was granted, prior to 28 March 2007. 523 | 524 | Nothing in this License shall be construed as excluding or limiting 525 | any implied license or other defenses to infringement that may 526 | otherwise be available to you under applicable patent law. 527 | 528 | 12. No Surrender of Others' Freedom. 529 | 530 | If conditions are imposed on you (whether by court order, agreement or 531 | otherwise) that contradict the conditions of this License, they do not 532 | excuse you from the conditions of this License. If you cannot convey a 533 | covered work so as to satisfy simultaneously your obligations under this 534 | License and any other pertinent obligations, then as a consequence you may 535 | not convey it at all. For example, if you agree to terms that obligate you 536 | to collect a royalty for further conveying from those to whom you convey 537 | the Program, the only way you could satisfy both those terms and this 538 | License would be to refrain entirely from conveying the Program. 539 | 540 | 13. Remote Network Interaction; Use with the GNU General Public License. 541 | 542 | Notwithstanding any other provision of this License, if you modify the 543 | Program, your modified version must prominently offer all users 544 | interacting with it remotely through a computer network (if your version 545 | supports such interaction) an opportunity to receive the Corresponding 546 | Source of your version by providing access to the Corresponding Source 547 | from a network server at no charge, through some standard or customary 548 | means of facilitating copying of software. This Corresponding Source 549 | shall include the Corresponding Source for any work covered by version 3 550 | of the GNU General Public License that is incorporated pursuant to the 551 | following paragraph. 552 | 553 | Notwithstanding any other provision of this License, you have 554 | permission to link or combine any covered work with a work licensed 555 | under version 3 of the GNU General Public License into a single 556 | combined work, and to convey the resulting work. The terms of this 557 | License will continue to apply to the part which is the covered work, 558 | but the work with which it is combined will remain governed by version 559 | 3 of the GNU General Public License. 560 | 561 | 14. Revised Versions of this License. 562 | 563 | The Free Software Foundation may publish revised and/or new versions of 564 | the GNU Affero General Public License from time to time. Such new versions 565 | will be similar in spirit to the present version, but may differ in detail to 566 | address new problems or concerns. 567 | 568 | Each version is given a distinguishing version number. If the 569 | Program specifies that a certain numbered version of the GNU Affero General 570 | Public License "or any later version" applies to it, you have the 571 | option of following the terms and conditions either of that numbered 572 | version or of any later version published by the Free Software 573 | Foundation. If the Program does not specify a version number of the 574 | GNU Affero General Public License, you may choose any version ever published 575 | by the Free Software Foundation. 576 | 577 | If the Program specifies that a proxy can decide which future 578 | versions of the GNU Affero General Public License can be used, that proxy's 579 | public statement of acceptance of a version permanently authorizes you 580 | to choose that version for the Program. 581 | 582 | Later license versions may give you additional or different 583 | permissions. However, no additional obligations are imposed on any 584 | author or copyright holder as a result of your choosing to follow a 585 | later version. 586 | 587 | 15. Disclaimer of Warranty. 588 | 589 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 590 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 591 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 592 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 593 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 594 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 595 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 596 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 597 | 598 | 16. Limitation of Liability. 599 | 600 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 601 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 602 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 603 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 604 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 605 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 606 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 607 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 608 | SUCH DAMAGES. 609 | 610 | 17. Interpretation of Sections 15 and 16. 611 | 612 | If the disclaimer of warranty and limitation of liability provided 613 | above cannot be given local legal effect according to their terms, 614 | reviewing courts shall apply local law that most closely approximates 615 | an absolute waiver of all civil liability in connection with the 616 | Program, unless a warranty or assumption of liability accompanies a 617 | copy of the Program in return for a fee. 618 | 619 | END OF TERMS AND CONDITIONS 620 | 621 | How to Apply These Terms to Your New Programs 622 | 623 | If you develop a new program, and you want it to be of the greatest 624 | possible use to the public, the best way to achieve this is to make it 625 | free software which everyone can redistribute and change under these terms. 626 | 627 | To do so, attach the following notices to the program. It is safest 628 | to attach them to the start of each source file to most effectively 629 | state the exclusion of warranty; and each file should have at least 630 | the "copyright" line and a pointer to where the full notice is found. 631 | 632 | <one line to give the program's name and a brief idea of what it does.> 633 | Copyright (C) <year> <name of author> 634 | 635 | This program is free software: you can redistribute it and/or modify 636 | it under the terms of the GNU Affero General Public License as published by 637 | the Free Software Foundation, either version 3 of the License, or 638 | (at your option) any later version. 639 | 640 | This program is distributed in the hope that it will be useful, 641 | but WITHOUT ANY WARRANTY; without even the implied warranty of 642 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 643 | GNU Affero General Public License for more details. 644 | 645 | You should have received a copy of the GNU Affero General Public License 646 | along with this program. If not, see <https://www.gnu.org/licenses/>. 647 | 648 | Also add information on how to contact you by electronic and paper mail. 649 | 650 | If your software can interact with users remotely through a computer 651 | network, you should also make sure that it provides a way for users to 652 | get its source. For example, if your program is a web application, its 653 | interface could display a "Source" link that leads users to an archive 654 | of the code. There are many ways you could offer source, and different 655 | solutions will be better for different programs; see section 13 for the 656 | specific requirements. 657 | 658 | You should also get your employer (if you work as a programmer) or school, 659 | if any, to sign a "copyright disclaimer" for the program, if necessary. 660 | For more information on this, and how to apply and follow the GNU AGPL, see 661 | <https://www.gnu.org/licenses/>. 662 | --------------------------------------------------------------------------------