├── .env.example ├── .gitignore ├── README.md ├── __init__.py ├── blog_gen_algo_v0.1.py ├── blog_gen_algo_v0.2.py ├── others.py ├── requirements.txt └── tools ├── __init__.py ├── chatgpt.py ├── const.py ├── decision.py ├── file.py ├── logger.py ├── scraper.py ├── serpapi.py ├── storyblok.py └── subprocess.py /.env.example: -------------------------------------------------------------------------------- 1 | # OpenAI 2 | OPENAI_API_KEY= 3 | OPENAI_MODEL= 4 | OPENAI_MAX_TOKENS= 5 | OPENAI_TEMPERATURE= 6 | OPENAI_STOP_SEQ= 7 | 8 | # SerpAPI 9 | SERP_API_KEY= 10 | 11 | #Service 12 | SERVICE_NAME= 13 | SERVICE_DESCRIPTION= 14 | SERVICE_URL= -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | .idea 3 | _logs 4 | _blogs 5 | venv 6 | sitemap.xml 7 | tools/__pycache__ 8 | .csv 9 | .venv -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BLOGEN - Blog Generation Application (Version 0.1) 2 | 3 | BLOGEN is a blog generation application designed to create well-structured blog posts using Markdown formatting. It takes primary keywords as input and generates engaging and informative blog content for various topics. This README file provides an overview of the BLOGEN application and instructions for usage. 4 | 5 | ## Table of Contents 6 | - [BLOGEN - Blog Generation Application (Version 0.1)](#blogen---blog-generation-application-version-01) 7 | - [Table of Contents](#table-of-contents) 8 | - [Introduction](#introduction) 9 | - [How it works](#how-it-works) 10 | - [Features](#features) 11 | - [Getting Started](#getting-started) 12 | - [Usage](#usage) 13 | - [Dependencies](#dependencies) 14 | - [Contributing](#contributing) 15 | - [License](#license) 16 | - [Roadmap](#roadmap) 17 | 18 | ## Introduction 19 | BLOGEN is a Python-based blog generation tool that leverages the power of GPT-3.5 (OpenAI's language model) to create captivating blog posts. The application uses the primary keywords provided by the user to generate prompts and interactively calls the language model to produce content for each step of the blog creation process. 20 | 21 | ## How it works 22 | ![LF24T35](https://github.com/rbatista191/blogen/assets/138892976/421f58a7-3291-4a92-886f-99e287f905ba) 23 | 24 | ## Features 25 | - **Keyword-driven Content Generation**: BLOGEN accepts primary keywords as input to create blog content tailored to specific topics. 26 | - **Iterative Blog Building**: The application follows a step-by-step approach to create a complete blog post, including introduction, tone, issue, remedy, options, implementation, pros and cons (optional), and conclusion. 27 | - **Various Writing Tones**: BLOGEN offers the flexibility to generate blog content in different tones, such as informative, conversational, persuasive, and more. 28 | - **Easy Integration with Markdown**: The generated blog content is formatted using Markdown, making it easily integrable with various platforms and content management systems. 29 | - **Version Control**: This is Version 0.1 of the BLOGEN application, with potential updates and improvements planned for future releases. 30 | 31 | ## Getting Started 32 | To get started with BLOGEN, follow these steps: 33 | 34 | 1. Clone the BLOGEN repository to your local machine: 35 | ``` 36 | git clone https://github.com/rbatista191/blogen.git 37 | ``` 38 | 39 | 2. Install the required dependencies (ensure you have Python 3.x installed): 40 | ``` 41 | pip install -r requirements.txt 42 | ``` 43 | 44 | 3. Create your own `.env` file based on `.env.example` and store the following 45 | - Obtain an API key for your preferred OpenAI GPT language model 46 | - Obtain an API key for [SerpAPI](https://serpapi.com/) (free version includes 100 searches/month) 47 | - Define your service attributes to be fed into the blog article 48 | 49 | 4. Upload your sitemap to the root of the workspace with filename `sitemap.xml` 50 | 51 | 5. Set your API key as an environment variable: 52 | ``` 53 | export OPENAI_API_KEY=your_api_key 54 | export SERP_API_KEY=your_api_key 55 | ``` 56 | 57 | 6. Launch the BLOGEN application: 58 | ``` 59 | python blog_gen_algo_v0.2.py [keyword] 60 | ``` 61 | 7. Alternatively use Streamlit for a browser interface: 62 | ``` 63 | streamlit run blog_gen_algo_v0.2.py [keyword] 64 | ``` 65 | 66 | ## Usage 67 | Upon launching the BLOGEN application, you will be prompted to enter primary keywords for your blog topic. Follow the on-screen instructions to provide the necessary information. 68 | 69 | The application will iteratively call the GPT-3.5 language model to generate content for each step of the blog creation process. The generated content will be presented in Markdown format. 70 | 71 | Once the blog is fully generated, you can copy the Markdown content and paste it into your preferred platform or content management system for publishing. 72 | 73 | ## Dependencies 74 | BLOGEN relies on the following Python packages: 75 | 76 | - `openai`: The official Python package for interfacing with the OpenAI GPT-3.5 language model. 77 | - `click`: A Python package for creating beautiful command-line interfaces. 78 | - `markdown`: A package for processing and rendering Markdown content. 79 | 80 | ## Contributing 81 | Contributions to BLOGEN are welcome! If you have ideas for improvements or bug fixes, please feel free to open an issue or submit a pull request. Before contributing, make sure to read our [Contributing Guidelines](CONTRIBUTING.md). 82 | 83 | ## License 84 | BLOGEN is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute this software as per the terms of the license. 85 | 86 | For any questions or feedback, please contact me at `gaurav18115@gmail.com`. 87 | 88 | ## Roadmap 89 | - [x] Create Key Takeaway section and shorten Introduction 90 | - [x] Fix cost calculation by including full message instead of prompt only 91 | - [x] Find a way to put anchor links on headings 92 | - [x] Create separate table-of-content section 93 | - [x] Swap Intro with Key Takeaways 94 | - [ ] Integrate [Unsplash API](https://unsplash.com/developers) to upload the picture to CMS 95 | - [ ] Improve YouTube link generation 96 | - [ ] Create logic to check the URLs to avoid 404 97 | 98 | --- 99 | This README file was generated using BLOGEN (Version 0.2) - The Blog Generation Application. 100 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rbatista191/blogen/b7490f80be17862eb30f7e9368d589d2b5215686/__init__.py -------------------------------------------------------------------------------- /blog_gen_algo_v0.1.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import sys 3 | 4 | import streamlit as st 5 | from md_toc import build_toc 6 | import xml.etree.ElementTree as ET 7 | 8 | from tools.chatgpt import chat_with_open_ai 9 | from tools.decision import require_data_for_prompt, require_better_prompt, find_tone_of_writing 10 | from tools.file import create_file_with_keyword, append_content_to_file 11 | from tools.logger import log_info, setup_logger 12 | from tools.serpapi import get_related_queries, get_image_with_commercial_usage 13 | from tools.subprocess import open_file_with_md_app 14 | from tools.const import SERVICE_NAME 15 | from tools.const import SERVICE_DESCRIPTION 16 | from tools.const import SERVICE_URL 17 | 18 | 19 | steps_prompts = [ 20 | # Step 1 21 | "Step 1: Given the primary keywords - {primary_keywords}, generate a captivating 5-8 words blog title. " 22 | "After that, write a 40-50 words teaser in {tone_of_writing} tone, " 23 | "something that creates curiosity and willingness to read more in reader's mind. " 24 | "Make sure to write in pure markdown format, with the blog title in H1 heading, " 25 | "and teaser in paragraph format.", 26 | # Step 2 27 | "Step 2: On the basis of the user intent for asking {primary_keywords}, set up a base ground of knowledge. " 28 | "Write facts and theories on this topic, add well-known data points and sources here. " 29 | "Use maximum 250 words for the content. Don't reach any conclusion yet. " 30 | "\nMake sure to write in pure markdown format, with headings and subheadings (H2 to H3), " 31 | "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc)." 32 | "\nLink 2-3 other of my blog posts (found in the sitemap posted below) within the content. " 33 | "Make sure to sound natural when linking to other blog posts, i.e., the text can only be slightly altered to accommodate a better context for the link. " 34 | "Make sure to use the anchor text is be the actual title of the other blog post, but rather something in the text that goes along the rationale. " 35 | "Sitemap: {sitemap_urls}", 36 | # Step 3 37 | "Step 3: If applicable, explain step by step how to do the required actions for the user intent in {primary_keywords}. " 38 | "Use maximum 400 words for the content. Don't reach any conclusion yet." 39 | "Make sure to write in pure markdown format, with headings and subheadings (H2 to H3), " 40 | "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).", 41 | # Step 4 42 | "Step 4: Introduce {service_name}, described as {service_description}" 43 | "Explain to the user how {service_name} can help them with their problem. " 44 | "Make sure to link {service_url} in the content. " 45 | "Demonstrate how to use {service_name} in easy steps. Don't go beyond what is mentioned in the service description. " 46 | "Use maximum 100 words for the content. Don't reach any conclusion yet. " 47 | "Make sure to write in pure markdown format, with headings and subheadings (H2 to H3), " 48 | "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).", 49 | # Step 5 50 | "Step 5: Generate a conclusion based on the content of this blog. Use {tone_of_writing} tone to" 51 | "ease the user intent to take the next step on {primary_keywords}. " 52 | "Use maximum 150 words for the content." 53 | "Make sure to write in pure markdown format, with headings and subheadings (H1 to H4), " 54 | "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).", 55 | ] 56 | 57 | def load_sitemap_and_extract_urls(sitemap_path): 58 | # Parse the XML file 59 | tree = ET.parse(sitemap_path) 60 | root = tree.getroot() 61 | 62 | # Namespace, often found in sitemap files 63 | namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'} 64 | 65 | # Extract URLs 66 | urls = [elem.text for elem in root.findall('ns:url/ns:loc', namespace)] 67 | return urls 68 | 69 | def generate_blog_for_keywords(primary_keywords="knee replacement surgery", service_name=SERVICE_NAME, service_description=SERVICE_DESCRIPTION, service_url=SERVICE_URL): 70 | # Iterate through each example 71 | messages = [] 72 | 73 | filepath = create_file_with_keyword(primary_keywords) 74 | log_info(f'🗂️ File Created {filepath}') 75 | open_file_with_md_app(filepath) 76 | 77 | secondary_keywords = get_related_queries(primary_keywords) 78 | log_info(f'🎬 Primary Keywords: {primary_keywords}') 79 | log_info(f'📗 Secondary Keywords: {secondary_keywords}') 80 | 81 | # Create the system message with primary and secondary keywords 82 | system_message_1 = f"SYSTEM: Act as an experienced SEO specialist and experienced content writer. " \ 83 | f"Given a blog with topic {primary_keywords}, help in generating rich content " \ 84 | f"for SEO optimized blog." \ 85 | f"Write custom heading for this response. " \ 86 | f"Naturally use primary Keywords: [{primary_keywords}], and " \ 87 | f"secondary keywords: [{secondary_keywords}] wherever it fits." \ 88 | f"Use John Gruber’s Markdown to format your responses." \ 89 | f"Use original content, avoid plagiarism, increase readability." 90 | 91 | log_info(f'🤖 System:\n{system_message_1}\n\n') 92 | messages.append({"role": "system", "content": system_message_1}) 93 | 94 | tone_of_writing = find_tone_of_writing(primary_keywords, messages) 95 | 96 | sitemap_path = 'sitemap.xml' 97 | sitemap_urls = load_sitemap_and_extract_urls(sitemap_path) 98 | log_info(f'🗺️ Sitemap URLs: {sitemap_urls}') 99 | 100 | i = 1 101 | total_words = 0 102 | already_sourced = [] 103 | for step_prompt in steps_prompts: 104 | # Pre-defined prompt 105 | prompt = step_prompt.format(primary_keywords=primary_keywords, 106 | tone_of_writing=tone_of_writing, 107 | service_name=service_name, 108 | service_description=service_description, 109 | service_url=service_url, 110 | sitemap_urls=sitemap_urls 111 | ) 112 | log_info(f'⏭️ Step {i} # Predefined Prompt: {prompt}') 113 | messages.append({"role": "user", "content": prompt}) 114 | 115 | # Check for better prompt 116 | if i > 2: 117 | better_prompt = require_better_prompt(primary_keywords, prompt, messages) 118 | if better_prompt: 119 | prompt = better_prompt 120 | 121 | # Add image 122 | add_image = False 123 | if add_image: 124 | image_content, already_sourced = get_image_with_commercial_usage(primary_keywords, prompt, already_sourced) 125 | if image_content: 126 | append_content_to_file(filepath, image_content, st if CLI else None) 127 | 128 | # Add News 129 | news_data = require_data_for_prompt(primary_keywords, prompt) 130 | if news_data: 131 | messages.append({"role": "assistant", "content": f"Found news on the topic: {news_data}"}) 132 | 133 | response = chat_with_open_ai(messages, temperature=0.9) 134 | messages.append({"role": "assistant", "content": response}) 135 | 136 | append_content_to_file(filepath, response, st if CLI else None) 137 | log_info(f'🔺 ️Completed Step {i}. Words: {len(response.split(" "))}') 138 | 139 | i += 1 140 | total_words += len(response.split(" ")) 141 | 142 | #footer_message = f"🎁 Finished generation at {datetime.datetime.now()}. 📬 Total words: {total_words}" 143 | #append_content_to_file(filepath, footer_message, st if CLI else None) 144 | 145 | # Read the generated content 146 | with open(filepath, 'r') as file: 147 | content = file.read() 148 | 149 | # Generate ToC 150 | toc = build_toc(filepath) 151 | 152 | # Insert ToC at the beginning of the content 153 | content_with_toc = toc + "\n\n" + content 154 | 155 | # Rewrite the file with ToC 156 | with open(filepath, 'w') as file: 157 | file.write(content_with_toc) 158 | 159 | 160 | 161 | def run_streamlit_app(): 162 | st.title("📝BLOGEN v0.1 (Blog Generation Algorithm)") 163 | 164 | # Add a text input field 165 | input_text = st.text_input("Enter some text:") 166 | 167 | # Add a submit button 168 | if st.button("Submit"): 169 | # Execute the function with the input text 170 | generate_blog_for_keywords(input_text) 171 | 172 | 173 | def run_terminal_app(keywords): 174 | generate_blog_for_keywords(keywords, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL) 175 | 176 | 177 | if __name__ == "__main__": 178 | CLI = True 179 | setup_logger() 180 | 181 | if CLI: 182 | _keywords = " ".join(sys.argv[1:]) 183 | if _keywords.strip() == "": 184 | print("Error: keywords not specified.\nUSAGE: python blog_gen_algo_v0.1.py ") 185 | while True: 186 | if _keywords.strip() == "": 187 | _keywords = input("\nEnter the primary keywords:") 188 | else: 189 | break 190 | 191 | log_info('Starting BLOGEN...') 192 | run_terminal_app(_keywords) 193 | 194 | else: 195 | run_streamlit_app() 196 | -------------------------------------------------------------------------------- /blog_gen_algo_v0.2.py: -------------------------------------------------------------------------------- 1 | from urllib.parse import urlparse 2 | from datetime import datetime 3 | import argparse 4 | 5 | import streamlit as st 6 | import xml.etree.ElementTree as ET 7 | 8 | from tools.chatgpt import chat_with_open_ai 9 | from tools.file import create_file_with_keyword, append_content_to_file 10 | from tools.logger import log_info, setup_logger 11 | from tools.scraper import fetch_and_parse 12 | from tools.serpapi import get_related_queries, get_image_with_commercial_usage, get_search_urls 13 | from tools.storyblok import post_article_to_storyblok 14 | from tools.subprocess import open_file_with_md_app 15 | from tools.const import OPENAI_TEMPERATURE, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL 16 | from tokencost import calculate_prompt_cost, calculate_completion_cost 17 | 18 | # Step-to-Model Mapping: Define your model preferences here 19 | step_to_model = { 20 | 1: 'gpt-4-0125-preview', # Outline 21 | 2: 'gpt-3.5-turbo', # Introduction 22 | 3: 'gpt-4-0125-preview', # Body (...) 23 | 4: 'gpt-4-0125-preview', 24 | 5: 'gpt-4-0125-preview', 25 | 6: 'gpt-4-0125-preview', 26 | 7: 'gpt-4-0125-preview', 27 | 8: 'gpt-4-0125-preview', 28 | 9: 'gpt-4-0125-preview', 29 | 10: 'gpt-4-0125-preview', # Conclusion 30 | 11: 'gpt-3.5-turbo', # Related Posts 31 | 12: 'gpt-3.5-turbo', # Meta Description 32 | 13: 'gpt-3.5-turbo', # Title 33 | 14: 'gpt-3.5-turbo', # Key Takeaways 34 | 15: 'gpt-3.5-turbo', # ToC 35 | } 36 | 37 | 38 | steps_prompts = [ 39 | # Step 1 40 | "Given the primary keywords - {primary_keywords}, the first step will be an outline of the article with 5-6 headings and respective subheadings. " 41 | "Take into consideration the summary of the first 10 search results for the keyword: {summary_of_search_results}" 42 | , 43 | # Step 2 44 | "The second step is to write the introduction of the article, without any H2 title. Aim at 50-60 words, be concise yet impactful. " 45 | , 46 | # Step 3 47 | "You will proceed to write the first point of the outline (if this point doesn't exist, simply don't respond). " 48 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 49 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 50 | "Whenever relevant highlight tools that can help the user, " 51 | "cover templates that allow the user to simply copy-paste " 52 | "and include references to other websites if helpful for the user. " 53 | , 54 | # Step 4 55 | "You will proceed to write the second point of the outline (if this point doesn't exist, simply don't respond). " 56 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 57 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 58 | "Whenever relevant highlight tools that can help the user, " 59 | "cover templates that allow the user to simply copy-paste " 60 | "and include references to other websites if helpful for the user. " 61 | , 62 | # Step 5 63 | "You will proceed to write the third point of the outline (if this point doesn't exist, simply don't respond). " 64 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 65 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 66 | "Whenever relevant highlight tools that can help the user, " 67 | "cover templates that allow the user to simply copy-paste " 68 | "and include references to other websites if helpful for the user. " 69 | , 70 | # Step 6 71 | "You will proceed to write the fourth point of the outline (if this point doesn't exist, simply don't respond). " 72 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 73 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 74 | "Whenever relevant highlight tools that can help the user, " 75 | "cover templates that allow the user to simply copy-paste " 76 | "and include references to other websites if helpful for the user. " 77 | , 78 | # Step 7 79 | "You will proceed to write the fifth point of the outline (if this point doesn't exist, simply don't respond). " 80 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 81 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 82 | "Whenever relevant highlight tools that can help the user, " 83 | "cover templates that allow the user to simply copy-paste " 84 | "and include references to other websites if helpful for the user. " 85 | , 86 | # Step 8 87 | "You will proceed to write the sixth point of the outline (if this point doesn't exist, simply don't respond). " 88 | "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. " 89 | "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. " 90 | "Whenever relevant highlight tools that can help the user, " 91 | "cover templates that allow the user to simply copy-paste " 92 | "and include references to other websites if helpful for the user. " 93 | , 94 | # Step 9 95 | "You will create a concisive conclusion paragraph, with H2 heading 'Conclusion'. " 96 | "Define the anchor link with the following format: ## H2 Title " 97 | , 98 | # Step 10 99 | "You will create five unique Frequently Asked Questions (FAQs) after the conclusion. " 100 | "The FAQs need to take the keyword into account at all times. " 101 | "Make sure to add an anchor link to the H2 heading 'Frequently Asked Questions (FAQs)', with two new lines after the heading. " 102 | "Define the anchor link with the following format: ## H2 Title " 103 | "The FAQs should have the questions in H3 heading and the answers below (separated by a new line), " 104 | "with the format: " 105 | "### Question? " 106 | "Answer" 107 | , 108 | # Step 11 109 | "Please create a related posts section (with H2 heading 'Related Posts'), with two new lines after the heading. " 110 | "Include 3-4 articles that are the most relevant to this topic out of the existing blog posts described in the sitemap below: {sitemap_urls}. " 111 | "The bullets should have the title of the article directly with the link to the article - e.g., in markdown [title](link)." 112 | , 113 | # Step 12 114 | "Please create a meta description (120-140 characters) for the article you just generated." 115 | , 116 | # Step 13 117 | "Please create 5 variations of a slightly click-baity (to invite the reader to click the link), SEO-optimized title (50-60 characters) for the article below. " 118 | "Make sure to include the problem it is solving. Avoid futuristic and corporate type of words, phrase it as an How-To or even a Question. " 119 | "The title should be in the format: 'Keyword - Subtitle', but only if the keyword fits well in the title. Don't use quotes or special characters in the title. " 120 | "Present the titles in a single line (no bullets or numbers), each separated by a semicolon." 121 | , 122 | # Step 14 123 | "Create a Key Takeaways section summarising crucial points. " 124 | "Make sure to use the H2 heading 'Key Takeaways' with two new lines after the heading. " 125 | "The Key Takeaways should be in bullet format, with the format: " 126 | "- Takeaway 1" 127 | "\n- Takeaway 2" 128 | , 129 | # Step 15 130 | "Create a table of contents (ToC) for the article, only keeping H2 headings and excluding Key Takaways and Introduction. " 131 | "Do not create a 'Table of Contents' H2 heading. " 132 | "Make sure to include links to each section in the ToC, with the format: " 133 | "[H2 Title](#h2-title)" 134 | , 135 | ] 136 | 137 | def load_sitemap_and_extract_urls(sitemap_path): 138 | # Parse the XML file 139 | tree = ET.parse(sitemap_path) 140 | root = tree.getroot() 141 | 142 | # Namespace, often found in sitemap files 143 | namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'} 144 | 145 | # Extract URLs 146 | urls = [elem.text for elem in root.findall('ns:url/ns:loc', namespace)] 147 | return urls 148 | 149 | def generate_blog_for_keywords(primary_keywords="knee replacement surgery", service_name=SERVICE_NAME, service_description=SERVICE_DESCRIPTION, service_url=SERVICE_URL, post_to_storyblok=False): 150 | # Iterate through each example 151 | messages = [] 152 | payload = {"title": "", "metadescription": "", "intro": "", "body": "", "conclusion": "", "related_posts": "", "faqs": "", "keyword": primary_keywords, "key_takeaways": "", "toc": ""} 153 | 154 | filepath = create_file_with_keyword(primary_keywords) 155 | log_info(f'🗂️ File Created {filepath}') 156 | open_file_with_md_app(filepath) 157 | 158 | log_info(f'🎬 Primary Keywords: {primary_keywords}') 159 | 160 | summarized_contents = [] 161 | total_cost = 0 162 | urls = get_search_urls(primary_keywords, number_of_results=10) 163 | for url in urls: 164 | content = fetch_and_parse(url) 165 | if content: 166 | # Summarize the content using OpenAI 167 | summarisation_model = "gpt-3.5-turbo" 168 | summary_prompt = f"Create a knowledge base of the maximum number of tools, templates and references, in 300 words or less: {content[:3000]}" 169 | summary = chat_with_open_ai([{"role": "user", "content": summary_prompt}], model=summarisation_model) 170 | summarized_contents.append(summary) 171 | prompt_cost = calculate_prompt_cost(summary_prompt, model=summarisation_model) 172 | completion_cost = calculate_completion_cost(summary, model=summarisation_model) 173 | total_cost += prompt_cost + completion_cost 174 | 175 | if summarized_contents: 176 | concatenated_summaries = " ".join(summarized_contents) # Combine all summaries into one large text 177 | summary_of_search_results_prompt = f"Summarize the following content in 300 words or less, focusing on covering as many tools, templates and references as possible: {concatenated_summaries}" 178 | summary_of_search_results = chat_with_open_ai([{"role": "user", "content": summary_of_search_results_prompt}], model=summarisation_model) 179 | log_info(f"Summary of search results: {summary_of_search_results}\nCost: {total_cost}") 180 | 181 | # Create the system message with primary and secondary keywords 182 | system_message_1 = f"You are an AI coding writing expert with vast experience in writing techniques and frameworks. "\ 183 | f"As a skilled content creator, you will craft a 100% unique, human-written, and SEO-optimized article in fluent English that is informative. You will focus exclusively on the keyword provided by the user. "\ 184 | f"You will write a long-form article that is well-structured and easy to read, with introduction, a body, a conclusion and five unique FAQs. "\ 185 | f"Use John Gruber's Markdown to format your responses, with appropriate H2 to H4 tags. Use original content, avoid plagiarism, and increase readability. "\ 186 | f"Avoid bullshitter talk, focus on being helpful, reliable and create people-first content to the reader, as per Google's guidelines: https://developers.google.com/search/docs/fundamentals/creating-helpful-content "\ 187 | f"Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "\ 188 | f"You explain step by step how to do the required actions for the user intent in the keyword provided. Give examples of tools and explain how to use them. Always link to their websites to ensure the credit is given to them (include the links as anchors in the most relevant words). "\ 189 | f"Use an informational tone and always reference the sources you use with the website links, including if you are referring to competitors of Emilio as we need to maximize collaboration in the network. "\ 190 | f"Whenever relevant, introduce Emilio (with website https//getemil.io/)and explain to the user how Emilio can help them with their problem. "\ 191 | f"Emilio’s is an AI-powered email client designed to save users time. Key functionalities include sorting prioritized emails, summarizing messages and digesting the inbox, drafting emails with the user's tone, and requiring no installation as it operates in the background. The service integrates with the user's existing Gmail account. "\ 192 | f"The interaction with the user will take several steps below. You will take the necessary time in every step, and do one at a time to ensure the maximum quality possible." 193 | 194 | messages.append({"role": "system", "content": system_message_1}) 195 | 196 | sitemap_path = 'sitemap.xml' 197 | sitemap_urls = load_sitemap_and_extract_urls(sitemap_path) 198 | 199 | i = 1 200 | total_words = 0 201 | total_cost = 0 202 | for step_prompt in steps_prompts: 203 | # Pre-defined prompt 204 | prompt = step_prompt.format(primary_keywords=primary_keywords, 205 | #tone_of_writing=tone_of_writing, 206 | service_name=service_name, 207 | service_description=service_description, 208 | service_url=service_url, 209 | sitemap_urls=sitemap_urls, 210 | summary_of_search_results=summary_of_search_results 211 | ) 212 | messages.append({"role": "user", "content": prompt}) 213 | 214 | model = step_to_model.get(i, 'gpt-4-0125-preview') # Fallback to a default model if not specified 215 | prompt_cost = calculate_prompt_cost(messages, model) 216 | 217 | response = chat_with_open_ai(messages, model=model, temperature=OPENAI_TEMPERATURE) 218 | completion_cost = calculate_completion_cost(response, model) 219 | total_cost += prompt_cost + completion_cost 220 | 221 | messages.append({"role": "assistant", "content": response}) 222 | 223 | # Don't append the response of the first step 224 | if i > 1: 225 | append_content_to_file(filepath, response, st if CLI else None) 226 | log_info(f'🔺 ️Completed Step {i}. Words: {len(response.split(" "))}, Cost: {prompt_cost + completion_cost}') 227 | 228 | # Capture the response for each section 229 | if i == 2: # Assuming intro is captured here 230 | payload['intro'] += response 231 | elif 3 <= i <= 8: # Assuming body is constructed here 232 | payload['body'] += response + "\n" 233 | elif i == 9: # Conclusion 234 | payload['conclusion'] += response 235 | elif i == 10: # FAQs 236 | payload['faqs'] += response 237 | elif i == 11: # Related posts 238 | payload['related_posts'] += response 239 | elif i == 12: # Meta description 240 | payload['metadescription'] += response 241 | elif i == 13: # Title 242 | payload['title'] += response 243 | elif i == 14: # Key Takeaway 244 | payload['key_takeaways'] += response 245 | elif i == 15: # ToC 246 | payload['toc'] += response 247 | 248 | i += 1 249 | total_words += len(response.split(" ")) 250 | 251 | # Reassemble the content according to the new order 252 | ## Split titles string into a list of individual titles 253 | titles_list = payload['title'].split(';') 254 | ## Construct the titles section with HTML comment and Markdown headings 255 | titles_section = "\n" 256 | for title in titles_list: 257 | titles_section += f"# {title.strip()}\n" 258 | titles_section += "\n" 259 | 260 | # Assemble the metadata 261 | ## Assuming 'payload['title']' contains multiple titles separated by ";", choose the first one for the metadata 262 | title_for_metadata = payload['title'].split(';')[0].strip() 263 | ## Format the current date in YYYY-MM-DD format for the metadata 264 | current_date = datetime.now().strftime("%Y-%m-%d") 265 | ## Construct the metadata section 266 | metadata_section = ( 267 | "---\n" 268 | f"title: {title_for_metadata}\n" 269 | f"date: {current_date}\n" 270 | "authors:\n" 271 | " - name: \n" 272 | " title: \n" 273 | " picture: \n" 274 | "tags: [\"1\", \"2\", \"2\", \"4\"]\n" 275 | "---\n\n" 276 | ) 277 | 278 | ordered_content = (metadata_section + 279 | "\n" + payload['metadescription'] + "\n\n" + 280 | titles_section + "\n" + 281 | "\n" + payload['intro'] + "\n\n" + 282 | "\n" + payload['toc'] + "\n\n" + 283 | payload['key_takeaways'] + "\n\n" + 284 | "\n" + payload['body'] + "\n\n" + 285 | payload['conclusion'] + "\n\n" + 286 | payload['related_posts'] + "\n\n" + 287 | payload['faqs'] 288 | ) 289 | 290 | # Now, instead of posting to Storyblok, first write to the file 291 | with open(filepath, 'w') as file: 292 | file.write(ordered_content) 293 | 294 | log_info(f'Total cost of operation: {total_cost}') 295 | 296 | # At the end of the loop, send the payload to Storyblok 297 | if post_to_storyblok: 298 | post_article_to_storyblok(payload) 299 | 300 | def run_streamlit_app(): 301 | st.title("📝BLOGEN v0.2 (Blog Generation Algorithm)") 302 | 303 | # Add a text input field 304 | input_text = st.text_input("Enter some text:") 305 | 306 | # Add a submit button 307 | if st.button("Submit"): 308 | # Execute the function with the input text 309 | generate_blog_for_keywords(input_text) 310 | 311 | 312 | def run_terminal_app(keywords, post_to_storyblok=False): 313 | generate_blog_for_keywords(keywords, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL,post_to_storyblok) 314 | 315 | 316 | if __name__ == "__main__": 317 | CLI = True 318 | setup_logger() 319 | 320 | # Set up argument parser 321 | parser = argparse.ArgumentParser(description="Generate a blog post and optionally post to Storyblok.") 322 | parser.add_argument('--storyblok', action='store_true', help="If set, post the generated content to Storyblok.") 323 | parser.add_argument('keywords', metavar='keyword', type=str, nargs='+', help='Keywords for generating the blog post.') 324 | 325 | # Parse arguments 326 | args = parser.parse_args() 327 | 328 | # Extract options and keywords 329 | post_to_storyblok = args.storyblok 330 | keywords = " ".join(args.keywords) 331 | 332 | log_info('Starting BLOGEN...') 333 | 334 | if keywords.strip() == "": 335 | print("Error: No keywords specified. Please provide at least one keyword.") 336 | else: 337 | run_terminal_app(keywords, post_to_storyblok=post_to_storyblok) 338 | -------------------------------------------------------------------------------- /others.py: -------------------------------------------------------------------------------- 1 | from tools.chatgpt import chat_with_open_ai 2 | from tools.storyblok import fetch_articles 3 | import csv 4 | 5 | def extract_text(content): 6 | text = "" 7 | if isinstance(content, dict): 8 | if 'content' in content: 9 | for item in content['content']: 10 | text += extract_text(item) 11 | if 'text' in content: 12 | text += content['text'] 13 | elif isinstance(content, list): 14 | for item in content: 15 | text += extract_text(item) 16 | return text 17 | 18 | def preprocess_title(title): 19 | # Replace new lines with spaces or any other suitable character/formatting 20 | return title.replace("\n", " ").strip() 21 | 22 | def improve_articles_names(articles, csv_filename): 23 | with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile: 24 | csvwriter = csv.writer(csvfile) 25 | # Writing the header of the CSV file 26 | csvwriter.writerow(['Slug', 'Original Title', 'Suggested Title 1', 'Suggested Title 2', 'Suggested Title 3', 'Suggested Title 4', 'Suggested Title 5']) 27 | 28 | # Process only the first 3 articles for testing 29 | for article in articles: 30 | slug = article['slug'] 31 | name = article['name'] 32 | body = extract_text(article['content']['body']) 33 | 34 | prompt = f""" 35 | Please create 5 variations of a slightly click-baity (to invite the reader to click the link), SEO-optimized title (50-60 characters) for the article below. 36 | The keyword associated with the article is '{slug}'. 37 | Make sure to include the problem it is solving. Avoid futuristic and corporate type of words, phrase it as an How-To or even a Question. 38 | The title should be in the format: 'Keyword: Subtitle', but only if the keyword fits well in the title. Don't use quotes or special characters in the title. 39 | Present the titles in a single line (no bullets or numbers), each separated by a semicolon. Respond only with the titles, no need to include any other information. 40 | \nArticle: \n{body} 41 | """ 42 | 43 | suggested_names_str = chat_with_open_ai([{"role": "user", "content": prompt}], model='gpt-3.5-turbo') 44 | suggested_names = suggested_names_str.split(';') 45 | 46 | # Preprocess each suggested title to handle new lines 47 | suggested_names = [preprocess_title(title) for title in suggested_names] 48 | 49 | # Ensure that there are exactly 5 suggested titles 50 | suggested_names = suggested_names[:5] + [''] * (5 - len(suggested_names)) 51 | 52 | csvwriter.writerow([slug, name] + suggested_names) 53 | 54 | if __name__ == "__main__": 55 | articles = fetch_articles() 56 | improve_articles_names(articles, 'suggested_titles.csv') 57 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Generated on Tue Jul 25 13:00:17 IST 2023 2 | aiohttp==3.8.5 3 | aiosignal==1.3.1 4 | altair==5.0.1 5 | async-timeout==4.0.2 6 | attrs==23.1.0 7 | blinker==1.6.2 8 | cachetools==5.3.1 9 | certifi==2023.7.22 10 | charset-normalizer==3.2.0 11 | click==8.1.6 12 | decorator==5.1.1 13 | frozenlist==1.4.0 14 | gitdb==4.0.10 15 | GitPython==3.1.32 16 | google-search-results==2.4.2 17 | idna==3.4 18 | importlib-metadata==6.8.0 19 | Jinja2==3.1.2 20 | jsonschema==4.18.4 21 | jsonschema-specifications==2023.7.1 22 | markdown-it-py==3.0.0 23 | MarkupSafe==2.1.3 24 | mdurl==0.1.2 25 | md_toc==8.2.2 26 | multidict==6.0.4 27 | numpy==1.25.1 28 | openai==0.27.8 29 | packaging==23.1 30 | pandas==2.0.3 31 | Pillow==9.5.0 32 | protobuf==4.23.4 33 | pyarrow==12.0.1 34 | pydeck==0.8.0 35 | Pygments==2.15.1 36 | Pympler==1.0.1 37 | python-dateutil==2.8.2 38 | python-dotenv==1.0.0 39 | pytz==2023.3 40 | pytz-deprecation-shim==0.1.0.post0 41 | referencing==0.30.0 42 | requests==2.31.0 43 | rich==13.4.2 44 | rpds-py==0.9.2 45 | six==1.16.0 46 | smmap==5.0.0 47 | streamlit==1.25.0 48 | tenacity==8.2.2 49 | tokencost==0.1.2 50 | toml==0.10.2 51 | toolz==0.12.0 52 | tornado==6.3.2 53 | tqdm==4.65.0 54 | typing_extensions==4.7.1 55 | tzdata==2023.3 56 | tzlocal==4.3.1 57 | urllib3==2.0.4 58 | validators==0.20.0 59 | yarl==1.9.2 60 | zipp==3.16.2 61 | markdown==3.3.4 62 | serpapi==0.1.5 63 | beautifulsoup4==4.12.3 64 | -------------------------------------------------------------------------------- /tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rbatista191/blogen/b7490f80be17862eb30f7e9368d589d2b5215686/tools/__init__.py -------------------------------------------------------------------------------- /tools/chatgpt.py: -------------------------------------------------------------------------------- 1 | from time import sleep 2 | 3 | import openai 4 | 5 | from tools.const import OPENAI_API_KEY, OPENAI_MAX_TOKENS 6 | from tools.logger import log_info 7 | 8 | openai.api_key = OPENAI_API_KEY 9 | 10 | def chat_with_open_ai(conversation, model="gpt-4-0125-preview", temperature=0): 11 | max_retry = 3 12 | retry = 0 13 | messages = [{'role': x.get('role', 'assistant'), 14 | 'content': x.get('content', '')} for x in conversation] 15 | while True: 16 | try: 17 | response = openai.ChatCompletion.create(model=model, messages=messages, temperature=temperature) 18 | text = response['choices'][0]['message']['content'] 19 | 20 | # trim message object 21 | debug_object = [i['content'] for i in messages] 22 | debug_object.append(text) 23 | if response['usage']['total_tokens'] >= OPENAI_MAX_TOKENS: 24 | messages = split_long_messages(messages) 25 | if len(messages) > 1: 26 | messages.pop(1) 27 | 28 | return text 29 | except Exception as oops: 30 | log_info(f'Error communicating with OpenAI: "{oops}"') 31 | if 'maximum context length' in str(oops): 32 | messages = split_long_messages(messages) 33 | if len(messages) > 1: 34 | messages.pop(1) 35 | log_info(' DEBUG: Trimming oldest message') 36 | continue 37 | retry += 1 38 | if retry >= max_retry: 39 | log_info(f"Exiting due to excessive errors in API: {oops}") 40 | return str(oops) 41 | log_info(f'Retrying in {2 ** (retry - 1) * 5} seconds...') 42 | sleep(2 ** (retry - 1) * 5) 43 | 44 | 45 | def split_long_messages(messages): 46 | new_messages = [] 47 | for message in messages: 48 | content = message['content'] 49 | if len(content.split()) > 1000: 50 | # Split the content into chunks of 4096 tokens 51 | chunks = [content[i:i + 1000] for i in range(0, len(content), 1000)] 52 | 53 | # Create new messages for each chunk 54 | for i, chunk in enumerate(chunks): 55 | new_message = {'role': message['role'], 'content': chunk} 56 | if i == 0: 57 | # Replace the original message with the first chunk 58 | new_messages.append(new_message) 59 | else: 60 | # Append subsequent chunks as new messages 61 | new_messages.append({'role': message['role'], 'content': chunk}) 62 | else: 63 | new_messages.append(message) # No splitting required, add original message as it is 64 | 65 | return new_messages 66 | -------------------------------------------------------------------------------- /tools/const.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dotenv import load_dotenv 3 | 4 | load_dotenv() 5 | 6 | BLOG_WRITING_TONES = [ 7 | "Informative", 8 | "Conversational", 9 | "Inspirational/Motivational", 10 | "Educational", 11 | "Humorous", 12 | "Thought-Provoking", 13 | "Authoritative", 14 | "Empathetic", 15 | "Personal", 16 | "Argumentative/Persuasive", 17 | "Storytelling", 18 | "Expository" 19 | ] 20 | 21 | OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', '') 22 | OPENAI_MAX_TOKENS = int(os.getenv('OPENAI_MAX_TOKENS', 4000)) 23 | OPENAI_TEMPERATURE = float(os.getenv('OPENAI_TEMPERATURE', 0.9)) 24 | OPENAI_STOP_SEQ = os.getenv('OPENAI_STOP_SEQ', '\n') 25 | 26 | STORYBLOK_MANAGEMENTAPI_TOKEN = os.getenv('STORYBLOK_MANAGEMENTAPI_TOKEN', '') 27 | STORYBLOK_CONTENTAPI_TOKEN = os.getenv('STORYBLOK_CONTENTAPI_TOKEN', '') 28 | STORYBLOK_SPACE_ID = os.getenv('STORYBLOK_SPACE_ID', '') 29 | 30 | SERP_API_KEY = os.getenv('SERP_API_KEY', '') 31 | 32 | SERVICE_NAME = os.getenv('SERVICE_NAME', '') 33 | SERVICE_DESCRIPTION = os.getenv('SERVICE_DESCRIPTION','') 34 | SERVICE_URL = os.getenv('SERVICE_URL', '') 35 | 36 | -------------------------------------------------------------------------------- /tools/decision.py: -------------------------------------------------------------------------------- 1 | from tools.chatgpt import chat_with_open_ai 2 | from tools.logger import log_info 3 | from tools.serpapi import get_latest_news 4 | 5 | 6 | def find_tone_of_writing(primary_keywords, messages): 7 | new_messages = messages 8 | new_messages.append({"role": "user", "content": f"Which tones suites best in writing blog on {primary_keywords}? " 9 | f"Give one word answer."}) 10 | tone_of_writing = chat_with_open_ai(new_messages, temperature=1) 11 | return tone_of_writing 12 | 13 | 14 | def require_data_for_prompt(primary_keywords, next_prompt): 15 | new_messages = [ 16 | {"role": "system", "content": "Act as an experienced SEO specialist and experienced content writer."}, 17 | {"role": "user", "content": f"You will be asked to respond to : {next_prompt} \n\n " 18 | f"Would you require latest news on {primary_keywords}. " 19 | f"Respond with either \"yes\" or \"no\"."}] 20 | require_news = chat_with_open_ai(new_messages, temperature=1) 21 | if "yes" in require_news.lower(): 22 | news_data = get_latest_news(primary_keywords, next_prompt) 23 | log_info(f'🚨 Get News: {news_data}') 24 | return news_data 25 | else: 26 | log_info(f'👷‍ No news') 27 | return None 28 | 29 | 30 | def require_better_prompt(primary_keywords, next_prompt, messages): 31 | new_messages = [ 32 | {"role": "system", "content": "Act as an experienced SEO specialist and experienced content writer."} 33 | ] 34 | for previous in messages: 35 | if previous.get('role') == 'assistant': 36 | new_messages.append(previous) 37 | new_messages.append({"role": "user", "content": f"You will be asked to respond to : {next_prompt} \n\n " 38 | f"Can you suggest a better prompt judging the intent of user: {primary_keywords}. " 39 | f"Respond with only the prompt that you would ask a SEO specialist " 40 | f"or simply reply \"no\" if the given prompt is fine."}) 41 | better_prompt = chat_with_open_ai(new_messages, temperature=1) 42 | if "no" in better_prompt.lower(): 43 | log_info(f'✅ Prompt Ok ') 44 | return None 45 | else: 46 | log_info(f'🍀 Better Prompt: {better_prompt}') 47 | return better_prompt 48 | -------------------------------------------------------------------------------- /tools/file.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import os 3 | import sys 4 | import re 5 | 6 | from tools.logger import log_info 7 | 8 | project_root = os.path.abspath(os.path.dirname(os.path.dirname(__file__))) 9 | 10 | # Add the project root directory to the Python path 11 | sys.path.append(project_root) 12 | 13 | 14 | def get_ordinal_suffix(number): 15 | if 10 <= number % 100 <= 20: 16 | suffix = "th" 17 | else: 18 | suffix = {1: "st", 2: "nd", 3: "rd"}.get(number % 10, "th") 19 | return suffix 20 | 21 | 22 | def append_content_to_file(filename, new_content, st=None): 23 | if st: 24 | st.write(new_content) 25 | 26 | with open(filename, "a") as file: 27 | # Add a newline before writing the new content 28 | file.write("\n\n" + new_content) 29 | 30 | 31 | def create_file_with_keyword(keywords, directory="_blogs", extension="md"): 32 | if not os.path.exists(directory): 33 | os.makedirs(directory) 34 | 35 | processed_keywords = re.sub(r'[^\w\s]', ' ', keywords) 36 | processed_keywords = processed_keywords.strip().lower() 37 | 38 | title = '_'.join(processed_keywords.split(' ')) 39 | subdirectory = os.path.join(directory, title) 40 | if not os.path.exists(subdirectory): 41 | os.makedirs(subdirectory) 42 | 43 | # Find the number of files starting with the same keyword 44 | num_files_with_same_keyword = sum(1 for file in os.listdir(subdirectory) if file.startswith(title)) 45 | if num_files_with_same_keyword > 0: 46 | ordinal_suffix = get_ordinal_suffix(num_files_with_same_keyword + 1) 47 | filename = f"{title}-{num_files_with_same_keyword + 1}{ordinal_suffix}.{extension}" 48 | else: 49 | filename = f"{title}.{extension}" 50 | 51 | filepath = os.path.join(subdirectory, filename) 52 | 53 | # Create the new file 54 | #with open(filepath, "w") as file: 55 | # file.write(f"Runtime: {datetime.datetime.now()}") 56 | 57 | return filepath 58 | -------------------------------------------------------------------------------- /tools/logger.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | import logging.handlers 4 | 5 | from logging.handlers import TimedRotatingFileHandler 6 | 7 | 8 | class CustomFormatter(logging.Formatter): 9 | grey = "\x1b[38;5;240m" 10 | yellow = "\x1b[38;5;226m" 11 | red = "\x1b[38;5;196m" 12 | bold_red = "\x1b[31;1m" 13 | reset = "\x1b[0m" 14 | 15 | format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)" 16 | 17 | FORMATS = { 18 | logging.DEBUG: grey + format + reset, 19 | logging.INFO: grey + format + reset, 20 | logging.WARNING: yellow + format + reset, 21 | logging.ERROR: red + format + reset, 22 | logging.CRITICAL: bold_red + format + reset 23 | } 24 | 25 | def format(self, record): 26 | log_fmt = self.FORMATS.get(record.levelno) 27 | formatter = logging.Formatter(log_fmt) 28 | return formatter.format(record) 29 | 30 | 31 | def setup_logger(filename='theblogsystem.log'): 32 | logger = logging.getLogger() 33 | logger.setLevel(logging.INFO) 34 | 35 | # Create console handler with a different formatter (not JSON) 36 | ch = logging.StreamHandler() 37 | ch.setLevel(logging.INFO) 38 | color_formatter = CustomFormatter() 39 | ch.setFormatter(color_formatter) 40 | 41 | # Create rotating file handler 42 | subdirectory = '_logs' 43 | if not os.path.exists(subdirectory): 44 | os.makedirs(subdirectory) 45 | filepath = os.path.join(subdirectory, filename) 46 | 47 | fh = TimedRotatingFileHandler(filepath, when='D', interval=1, backupCount=30) 48 | fh.setLevel(logging.INFO) 49 | 50 | # FIX: Use JsonFormatter for 3rd Party Tools 51 | # jfh = jsonlogger.JsonFormatter('%(asctime)s %(levelname)s %(message)s') 52 | # jfh.setFormatter(file_formatter) 53 | 54 | fh.setFormatter(color_formatter) 55 | 56 | # Add the handlers to the logger 57 | logger.addHandler(ch) 58 | logger.addHandler(fh) 59 | 60 | 61 | def construct_log_message(message, *args, **kwargs): 62 | log_message = message 63 | if args: 64 | log_message += f" - {args}" 65 | if kwargs: 66 | log_message += f" - {kwargs}" 67 | return log_message 68 | 69 | 70 | def log_error(message, *args, **kwargs): 71 | logger = logging.getLogger(__name__) 72 | logger.error(construct_log_message(message, *args, **kwargs)) 73 | 74 | 75 | def log_info(message, *args, **kwargs): 76 | logger = logging.getLogger(__name__) 77 | logger.info(construct_log_message(message, *args, **kwargs)) 78 | 79 | 80 | def log_debug(message, *args, **kwargs): 81 | logger = logging.getLogger(__name__) 82 | logger.debug(construct_log_message(message, *args, **kwargs)) 83 | 84 | def log_warn(message, *args, **kwargs): 85 | logger = logging.getLogger(__name__) 86 | logger.warn(construct_log_message(message, *args, **kwargs)) 87 | -------------------------------------------------------------------------------- /tools/scraper.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from bs4 import BeautifulSoup 3 | 4 | def fetch_and_parse(url): 5 | try: 6 | response = requests.get(url, timeout=30) 7 | response.raise_for_status() # Raises an HTTPError if the status is 4xx, 5xx 8 | soup = BeautifulSoup(response.text, 'html.parser') 9 | content = soup.find('body').get_text(separator='\n', strip=True) 10 | return content 11 | except Exception as e: 12 | print(f"Failed to fetch {url}: {e}") 13 | return None -------------------------------------------------------------------------------- /tools/serpapi.py: -------------------------------------------------------------------------------- 1 | from serpapi import GoogleSearch 2 | 3 | from tools.chatgpt import chat_with_open_ai 4 | from tools.const import SERP_API_KEY 5 | from tools.logger import log_info 6 | 7 | def get_search_urls(keyword, number_of_results=5): 8 | params = { 9 | "engine": "google", 10 | "q": keyword, 11 | "api_key": SERP_API_KEY, 12 | } 13 | search = GoogleSearch(params) 14 | results = search.get_dict() 15 | search_results = results.get("organic_results", []) 16 | urls = [result["link"] for result in search_results[:number_of_results]] 17 | return urls 18 | 19 | 20 | def get_related_queries(keyword): 21 | params = { 22 | "engine": "google_trends", 23 | "q": keyword, 24 | "data_type": "RELATED_QUERIES", 25 | "api_key": SERP_API_KEY 26 | } 27 | 28 | search = GoogleSearch(params) 29 | results = search.get_dict() 30 | related_queries = results.get("related_queries", {}) 31 | 32 | # Extract rising and top queries separately and combine them into a single list 33 | rising_queries = [query["query"] for query in related_queries.get("rising", [])] 34 | top_queries = [query["query"] for query in related_queries.get("top", [])] 35 | all_queries = rising_queries[:2] + top_queries[:2] 36 | 37 | return ", ".join(all_queries) 38 | 39 | 40 | def get_latest_news(keywords, prompt): 41 | messages = [ 42 | {"role": "user", "content": f"Act as an experienced SEO specialist and experienced content writer. " 43 | f"Given keywords - [{keywords}], and a prompt [{prompt}] " 44 | f"Find the necessary 2-3 keywords related to primary keywords " 45 | f"from the given prompt to search news from Google News. " 46 | f"Respond only with those keywords comma separated"} 47 | ] 48 | keywords = chat_with_open_ai(messages, temperature=1) 49 | 50 | log_info(f'🈂️ Keywords for news: {keywords}') 51 | params = { 52 | "engine": "google", 53 | "q": keywords, 54 | "tbm": "nws", 55 | "api_key": SERP_API_KEY 56 | } 57 | 58 | search = GoogleSearch(params) 59 | results = search.get_dict() 60 | 61 | # Extract news results and format them into a consumable string 62 | news_list = results.get("news_results", []) 63 | news_string = "" 64 | for news in news_list[:5]: 65 | title = news.get("title", "") 66 | link = news.get("link", "") 67 | source = news.get("source", "") 68 | published_date = news.get("published_date", "") 69 | summary = news.get("snippet", "") 70 | 71 | news_string += f"Title: {title}\n" 72 | news_string += f"Link: {link}\n" 73 | news_string += f"Source: {source}\n" 74 | news_string += f"Published Date: {published_date}\n" 75 | news_string += f"Summary: {summary}\n\n" 76 | 77 | return news_string 78 | 79 | 80 | def get_image_with_commercial_usage(keywords, prompt, already_sourced): 81 | # messages = [ 82 | # {"role": "user", "content": f"Act as an experienced SEO specialist and experienced content writer. " 83 | # f"Given primary keywords - [{keywords}], and a prompt [{prompt}] " 84 | # f"Find the necessary 2-3 keywords related to primary keywords " 85 | # f"from the given prompt to search images from Google Images. " 86 | # f"Respond only with those keywords comma separated."} 87 | # ] 88 | # keywords = chat_with_open_ai(messages, temperature=1) 89 | # if "no" in keywords.lower(): 90 | # return None, already_sourced 91 | 92 | log_info(f'🏞️ Keywords for image: {keywords}') 93 | params = { 94 | "engine": "google", 95 | "q": keywords, 96 | "tbm": "isch", 97 | "tbs": "sur:fmc", 98 | "api_key": SERP_API_KEY 99 | } 100 | search = GoogleSearch(params) 101 | results = search.get_dict() 102 | 103 | image_results = results.get("images_results", []) 104 | 105 | for image in image_results: 106 | image_source = image.get("source", "") 107 | image_url = image.get("original", "") 108 | if image_source in already_sourced or image_url in already_sourced: 109 | continue 110 | already_sourced.append(image_source) 111 | already_sourced.append(image_url) 112 | image_title = image.get("title", "") 113 | image_content = f"![{image_title}]({image_url})\n Source: {image_source}\n\n" 114 | return image_content, already_sourced 115 | 116 | return None, already_sourced 117 | -------------------------------------------------------------------------------- /tools/storyblok.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import json 3 | from tools.const import STORYBLOK_MANAGEMENTAPI_TOKEN, STORYBLOK_CONTENTAPI_TOKEN, STORYBLOK_SPACE_ID 4 | from tools.logger import log_info 5 | import markdown 6 | 7 | # Variables 8 | mgmtapi_token = STORYBLOK_MANAGEMENTAPI_TOKEN 9 | cntapi_token = STORYBLOK_CONTENTAPI_TOKEN 10 | space_id = STORYBLOK_SPACE_ID 11 | 12 | 13 | def post_article_to_storyblok(article_data): 14 | url = f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/" 15 | headers = { 16 | "Content-Type": "application/json", 17 | "Authorization": f"{mgmtapi_token}", 18 | } 19 | 20 | # Convert Markdown fields to HTML 21 | intro_html = markdown.markdown(article_data["intro"]) 22 | body_html = markdown.markdown(article_data["body"]) 23 | conclusion_html = markdown.markdown(article_data["conclusion"]) 24 | related_posts_html = markdown.markdown(article_data["related_posts"]) 25 | faqs_html = markdown.markdown(article_data["faqs"]) 26 | key_takeaways_html = markdown.markdown(article_data["key_takeaways"]) 27 | toc_html = markdown.markdown(article_data["toc"]) 28 | 29 | # Structure the payload according to Storyblok's requirements 30 | payload = { 31 | "story": { 32 | "name": article_data["title"], 33 | "slug": article_data["keyword"].lower().replace(" ", "-").replace("_", "-"), 34 | "content": { 35 | "component": "article", 36 | "title": article_data["title"], 37 | "metadescription": article_data["metadescription"], 38 | "intro": intro_html, 39 | "body": body_html, 40 | "conclusion": conclusion_html, 41 | "related_posts": related_posts_html, 42 | "faqs": faqs_html, 43 | "key_takeaways": key_takeaways_html, 44 | "toc": toc_html, 45 | } 46 | }, 47 | } 48 | 49 | # For creating a new story 50 | response = requests.post(url, data=json.dumps(payload), headers=headers) 51 | print(response) 52 | 53 | # For updating an existing story, use PUT request instead 54 | # response = requests.put(f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/{story_id}", json=data, headers=headers) 55 | 56 | if response.status_code == 200 or response.status_code == 201: 57 | log_info("Article posted successfully!") 58 | return response.json() 59 | else: 60 | log_info(f"Failed to post article. Status code: {response.status_code}, Message: {response.text}") 61 | return None 62 | 63 | def fetch_articles(): 64 | url = f"https://api.storyblok.com/v1/cdn/stories?version=published&token={cntapi_token}&space_id={space_id}" 65 | response = requests.get(url) 66 | if response.status_code == 200: 67 | articles = response.json().get('stories', []) 68 | return articles # Return all article components 69 | else: 70 | log_info(f"Failed to fetch articles. Status code: {response.status_code}, Message: {response.text}") 71 | return [] 72 | 73 | def update_article_in_storyblok(article_id, article_data, slug=""): 74 | url = f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/{article_id}" 75 | headers = { 76 | "Content-Type": "application/json", 77 | "Authorization": f"{mgmtapi_token}", 78 | } 79 | 80 | response = requests.put(url, data=json.dumps(article_data), headers=headers) 81 | 82 | if response.status_code == 200: 83 | print(f"Article '{slug}' updated successfully!") 84 | return response.json() 85 | else: 86 | print(f"Failed to update article. Status code: {response.status_code}, Message: {response.text}") 87 | return None 88 | -------------------------------------------------------------------------------- /tools/subprocess.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | 3 | 4 | def open_file_with_md_app(filepath): 5 | try: 6 | subprocess.run(["open", "-a", "Macdown.app", filepath], check=True) 7 | except subprocess.CalledProcessError as e: 8 | print(f"Error occurred while opening the file: {e}") 9 | except FileNotFoundError: 10 | print("The 'open' command is not available on this system (non-macOS).") 11 | --------------------------------------------------------------------------------