├── .env.example
├── .gitignore
├── README.md
├── __init__.py
├── blog_gen_algo_v0.1.py
├── blog_gen_algo_v0.2.py
├── others.py
├── requirements.txt
└── tools
    ├── __init__.py
    ├── chatgpt.py
    ├── const.py
    ├── decision.py
    ├── file.py
    ├── logger.py
    ├── scraper.py
    ├── serpapi.py
    ├── storyblok.py
    └── subprocess.py


/.env.example:
--------------------------------------------------------------------------------
 1 | # OpenAI
 2 | OPENAI_API_KEY=
 3 | OPENAI_MODEL=
 4 | OPENAI_MAX_TOKENS=
 5 | OPENAI_TEMPERATURE=
 6 | OPENAI_STOP_SEQ=
 7 | 
 8 | # SerpAPI
 9 | SERP_API_KEY=
10 | 
11 | #Service
12 | SERVICE_NAME=
13 | SERVICE_DESCRIPTION=
14 | SERVICE_URL= 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .env
2 | .idea
3 | _logs
4 | _blogs
5 | venv
6 | sitemap.xml
7 | tools/__pycache__
8 | .csv
9 | .venv


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # BLOGEN - Blog Generation Application (Version 0.1)
  2 | 
  3 | BLOGEN is a blog generation application designed to create well-structured blog posts using Markdown formatting. It takes primary keywords as input and generates engaging and informative blog content for various topics. This README file provides an overview of the BLOGEN application and instructions for usage.
  4 | 
  5 | ## Table of Contents
  6 | - [BLOGEN - Blog Generation Application (Version 0.1)](#blogen---blog-generation-application-version-01)
  7 |   - [Table of Contents](#table-of-contents)
  8 |   - [Introduction](#introduction)
  9 |   - [How it works](#how-it-works)
 10 |   - [Features](#features)
 11 |   - [Getting Started](#getting-started)
 12 |   - [Usage](#usage)
 13 |   - [Dependencies](#dependencies)
 14 |   - [Contributing](#contributing)
 15 |   - [License](#license)
 16 |   - [Roadmap](#roadmap)
 17 | 
 18 | ## Introduction
 19 | BLOGEN is a Python-based blog generation tool that leverages the power of GPT-3.5 (OpenAI's language model) to create captivating blog posts. The application uses the primary keywords provided by the user to generate prompts and interactively calls the language model to produce content for each step of the blog creation process.
 20 | 
 21 | ## How it works
 22 | ![LF24T35](https://github.com/rbatista191/blogen/assets/138892976/421f58a7-3291-4a92-886f-99e287f905ba)
 23 | 
 24 | ## Features
 25 | - **Keyword-driven Content Generation**: BLOGEN accepts primary keywords as input to create blog content tailored to specific topics.
 26 | - **Iterative Blog Building**: The application follows a step-by-step approach to create a complete blog post, including introduction, tone, issue, remedy, options, implementation, pros and cons (optional), and conclusion.
 27 | - **Various Writing Tones**: BLOGEN offers the flexibility to generate blog content in different tones, such as informative, conversational, persuasive, and more.
 28 | - **Easy Integration with Markdown**: The generated blog content is formatted using Markdown, making it easily integrable with various platforms and content management systems.
 29 | - **Version Control**: This is Version 0.1 of the BLOGEN application, with potential updates and improvements planned for future releases.
 30 | 
 31 | ## Getting Started
 32 | To get started with BLOGEN, follow these steps:
 33 | 
 34 | 1. Clone the BLOGEN repository to your local machine:
 35 |    ```
 36 |    git clone https://github.com/rbatista191/blogen.git
 37 |    ```
 38 | 
 39 | 2. Install the required dependencies (ensure you have Python 3.x installed):
 40 |    ```
 41 |    pip install -r requirements.txt
 42 |    ```
 43 | 
 44 | 3. Create your own `.env` file based on `.env.example` and store the following
 45 | - Obtain an API key for your preferred OpenAI GPT language model
 46 | - Obtain an API key for [SerpAPI](https://serpapi.com/) (free version includes 100 searches/month)
 47 | - Define your service attributes to be fed into the blog article 
 48 | 
 49 | 4. Upload your sitemap to the root of the workspace with filename `sitemap.xml`
 50 | 
 51 | 5. Set your API key as an environment variable:
 52 |    ```
 53 |    export OPENAI_API_KEY=your_api_key
 54 |    export SERP_API_KEY=your_api_key
 55 |    ```
 56 |    
 57 | 6. Launch the BLOGEN application:
 58 |    ```
 59 |    python blog_gen_algo_v0.2.py [keyword]
 60 |    ```
 61 | 7. Alternatively use Streamlit for a browser interface:
 62 |    ```
 63 |    streamlit run blog_gen_algo_v0.2.py [keyword]
 64 |    ```
 65 | 
 66 | ## Usage
 67 | Upon launching the BLOGEN application, you will be prompted to enter primary keywords for your blog topic. Follow the on-screen instructions to provide the necessary information.
 68 | 
 69 | The application will iteratively call the GPT-3.5 language model to generate content for each step of the blog creation process. The generated content will be presented in Markdown format.
 70 | 
 71 | Once the blog is fully generated, you can copy the Markdown content and paste it into your preferred platform or content management system for publishing.
 72 | 
 73 | ## Dependencies
 74 | BLOGEN relies on the following Python packages:
 75 | 
 76 | - `openai`: The official Python package for interfacing with the OpenAI GPT-3.5 language model.
 77 | - `click`: A Python package for creating beautiful command-line interfaces.
 78 | - `markdown`: A package for processing and rendering Markdown content.
 79 | 
 80 | ## Contributing
 81 | Contributions to BLOGEN are welcome! If you have ideas for improvements or bug fixes, please feel free to open an issue or submit a pull request. Before contributing, make sure to read our [Contributing Guidelines](CONTRIBUTING.md).
 82 | 
 83 | ## License
 84 | BLOGEN is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute this software as per the terms of the license.
 85 | 
 86 | For any questions or feedback, please contact me at `gaurav18115@gmail.com`.
 87 | 
 88 | ## Roadmap
 89 | - [x] Create Key Takeaway section and shorten Introduction
 90 | - [x] Fix cost calculation by including full message instead of prompt only
 91 | - [x] Find a way to put anchor links on headings
 92 | - [x] Create separate table-of-content section
 93 | - [x] Swap Intro with Key Takeaways
 94 | - [ ] Integrate [Unsplash API](https://unsplash.com/developers) to upload the picture to CMS
 95 | - [ ] Improve YouTube link generation
 96 | - [ ] Create logic to check the URLs to avoid 404
 97 | 
 98 | ---
 99 | This README file was generated using BLOGEN (Version 0.2) - The Blog Generation Application.
100 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rbatista191/blogen/b7490f80be17862eb30f7e9368d589d2b5215686/__init__.py


--------------------------------------------------------------------------------
/blog_gen_algo_v0.1.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import sys
  3 | 
  4 | import streamlit as st
  5 | from md_toc import build_toc
  6 | import xml.etree.ElementTree as ET
  7 | 
  8 | from tools.chatgpt import chat_with_open_ai
  9 | from tools.decision import require_data_for_prompt, require_better_prompt, find_tone_of_writing
 10 | from tools.file import create_file_with_keyword, append_content_to_file
 11 | from tools.logger import log_info, setup_logger
 12 | from tools.serpapi import get_related_queries, get_image_with_commercial_usage
 13 | from tools.subprocess import open_file_with_md_app
 14 | from tools.const import SERVICE_NAME
 15 | from tools.const import SERVICE_DESCRIPTION
 16 | from tools.const import SERVICE_URL
 17 | 
 18 | 
 19 | steps_prompts = [
 20 |     # Step 1
 21 |     "Step 1: Given the primary keywords - {primary_keywords}, generate a captivating 5-8 words blog title. "
 22 |     "After that, write a 40-50 words teaser in {tone_of_writing} tone, "
 23 |     "something that creates curiosity and willingness to read more in reader's mind. "
 24 |     "Make sure to write in pure markdown format, with the blog title in H1 heading, "
 25 |     "and teaser in paragraph format.",
 26 |     # Step 2
 27 |     "Step 2: On the basis of the user intent for asking {primary_keywords}, set up a base ground of knowledge. "
 28 |     "Write facts and theories on this topic, add well-known data points and sources here. "
 29 |     "Use maximum 250 words for the content. Don't reach any conclusion yet. "
 30 |     "\nMake sure to write in pure markdown format, with headings and subheadings (H2 to H3), "
 31 |     "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc)."
 32 |     "\nLink 2-3 other of my blog posts (found in the sitemap posted below) within the content. "
 33 |     "Make sure to sound natural when linking to other blog posts, i.e., the text can only be slightly altered to accommodate a better context for the link. "
 34 |     "Make sure to use the anchor text is be the actual title of the other blog post, but rather something in the text that goes along the rationale. "
 35 |     "Sitemap: {sitemap_urls}",
 36 |     # Step 3
 37 |     "Step 3: If applicable, explain step by step how to do the required actions for the user intent in {primary_keywords}. "
 38 |     "Use maximum 400 words for the content. Don't reach any conclusion yet."
 39 |     "Make sure to write in pure markdown format, with headings and subheadings (H2 to H3), "
 40 |     "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).",
 41 |     # Step 4
 42 |     "Step 4: Introduce {service_name}, described as {service_description}"
 43 |     "Explain to the user how {service_name} can help them with their problem. "
 44 |     "Make sure to link {service_url} in the content. "
 45 |     "Demonstrate how to use {service_name} in easy steps. Don't go beyond what is mentioned in the service description. "
 46 |     "Use maximum 100 words for the content. Don't reach any conclusion yet. "
 47 |     "Make sure to write in pure markdown format, with headings and subheadings (H2 to H3), "
 48 |     "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).",
 49 |     # Step 5
 50 |     "Step 5: Generate a conclusion based on the content of this blog. Use {tone_of_writing} tone to"
 51 |     "ease the user intent to take the next step on {primary_keywords}. "
 52 |     "Use maximum 150 words for the content."
 53 |     "Make sure to write in pure markdown format, with headings and subheadings (H1 to H4), "
 54 |     "paragraphs, lists and text formating (such as bold, italic, strikethrough, etc).",
 55 | ]
 56 | 
 57 | def load_sitemap_and_extract_urls(sitemap_path):
 58 |     # Parse the XML file
 59 |     tree = ET.parse(sitemap_path)
 60 |     root = tree.getroot()
 61 | 
 62 |     # Namespace, often found in sitemap files
 63 |     namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
 64 | 
 65 |     # Extract URLs
 66 |     urls = [elem.text for elem in root.findall('ns:url/ns:loc', namespace)]
 67 |     return urls
 68 | 
 69 | def generate_blog_for_keywords(primary_keywords="knee replacement surgery", service_name=SERVICE_NAME, service_description=SERVICE_DESCRIPTION, service_url=SERVICE_URL):
 70 |     # Iterate through each example
 71 |     messages = []
 72 | 
 73 |     filepath = create_file_with_keyword(primary_keywords)
 74 |     log_info(f'🗂️  File Created {filepath}')
 75 |     open_file_with_md_app(filepath)
 76 | 
 77 |     secondary_keywords = get_related_queries(primary_keywords)
 78 |     log_info(f'🎬  Primary Keywords: {primary_keywords}')
 79 |     log_info(f'📗  Secondary Keywords: {secondary_keywords}')
 80 | 
 81 |     # Create the system message with primary and secondary keywords
 82 |     system_message_1 = f"SYSTEM: Act as an experienced SEO specialist and experienced content writer. " \
 83 |                        f"Given a blog with topic {primary_keywords}, help in generating rich content " \
 84 |                        f"for SEO optimized blog." \
 85 |                        f"Write custom heading for this response. " \
 86 |                        f"Naturally use primary Keywords: [{primary_keywords}], and " \
 87 |                        f"secondary keywords: [{secondary_keywords}] wherever it fits." \
 88 |                        f"Use John Gruber’s Markdown to format your responses." \
 89 |                        f"Use original content, avoid plagiarism, increase readability."
 90 | 
 91 |     log_info(f'🤖  System:\n{system_message_1}\n\n')
 92 |     messages.append({"role": "system", "content": system_message_1})
 93 | 
 94 |     tone_of_writing = find_tone_of_writing(primary_keywords, messages)
 95 |     
 96 |     sitemap_path = 'sitemap.xml'
 97 |     sitemap_urls = load_sitemap_and_extract_urls(sitemap_path)
 98 |     log_info(f'🗺️  Sitemap URLs: {sitemap_urls}')
 99 | 
100 |     i = 1
101 |     total_words = 0
102 |     already_sourced = []
103 |     for step_prompt in steps_prompts:
104 |         # Pre-defined prompt
105 |         prompt = step_prompt.format(primary_keywords=primary_keywords, 
106 |                                     tone_of_writing=tone_of_writing, 
107 |                                     service_name=service_name, 
108 |                                     service_description=service_description, 
109 |                                     service_url=service_url, 
110 |                                     sitemap_urls=sitemap_urls
111 |                                     )
112 |         log_info(f'⏭️  Step {i} # Predefined Prompt: {prompt}')
113 |         messages.append({"role": "user", "content": prompt})
114 | 
115 |         # Check for better prompt
116 |         if i > 2:
117 |             better_prompt = require_better_prompt(primary_keywords, prompt, messages)
118 |             if better_prompt:
119 |                 prompt = better_prompt
120 | 
121 |         # Add image
122 |         add_image = False
123 |         if add_image:
124 |             image_content, already_sourced = get_image_with_commercial_usage(primary_keywords, prompt, already_sourced)
125 |             if image_content:
126 |                 append_content_to_file(filepath, image_content, st if CLI else None)
127 | 
128 |         # Add News
129 |         news_data = require_data_for_prompt(primary_keywords, prompt)
130 |         if news_data:
131 |             messages.append({"role": "assistant", "content": f"Found news on the topic: {news_data}"})
132 | 
133 |         response = chat_with_open_ai(messages, temperature=0.9)
134 |         messages.append({"role": "assistant", "content": response})
135 | 
136 |         append_content_to_file(filepath, response, st if CLI else None)
137 |         log_info(f'🔺 ️Completed Step {i}. Words: {len(response.split(" "))}')
138 | 
139 |         i += 1
140 |         total_words += len(response.split(" "))
141 | 
142 |     #footer_message = f"🎁  Finished generation at {datetime.datetime.now()}. 📬  Total words: {total_words}"
143 |     #append_content_to_file(filepath, footer_message, st if CLI else None)
144 |     
145 |     # Read the generated content
146 |     with open(filepath, 'r') as file:
147 |         content = file.read()
148 | 
149 |     # Generate ToC
150 |     toc = build_toc(filepath)
151 | 
152 |     # Insert ToC at the beginning of the content
153 |     content_with_toc = toc + "\n\n" + content
154 | 
155 |     # Rewrite the file with ToC
156 |     with open(filepath, 'w') as file:
157 |         file.write(content_with_toc)
158 | 
159 | 
160 | 
161 | def run_streamlit_app():
162 |     st.title("📝BLOGEN v0.1 (Blog Generation Algorithm)")
163 | 
164 |     # Add a text input field
165 |     input_text = st.text_input("Enter some text:")
166 | 
167 |     # Add a submit button
168 |     if st.button("Submit"):
169 |         # Execute the function with the input text
170 |         generate_blog_for_keywords(input_text)
171 | 
172 | 
173 | def run_terminal_app(keywords):
174 |     generate_blog_for_keywords(keywords, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL)
175 | 
176 | 
177 | if __name__ == "__main__":
178 |     CLI = True
179 |     setup_logger()
180 | 
181 |     if CLI:
182 |         _keywords = " ".join(sys.argv[1:])
183 |         if _keywords.strip() == "":
184 |             print("Error: keywords not specified.\nUSAGE: python blog_gen_algo_v0.1.py <keywords>")
185 |         while True:
186 |             if _keywords.strip() == "":
187 |                 _keywords = input("\nEnter the primary keywords:")
188 |             else:
189 |                 break
190 | 
191 |         log_info('Starting BLOGEN...')
192 |         run_terminal_app(_keywords)
193 | 
194 |     else:
195 |         run_streamlit_app()
196 | 


--------------------------------------------------------------------------------
/blog_gen_algo_v0.2.py:
--------------------------------------------------------------------------------
  1 | from urllib.parse import urlparse
  2 | from datetime import datetime
  3 | import argparse
  4 | 
  5 | import streamlit as st
  6 | import xml.etree.ElementTree as ET
  7 | 
  8 | from tools.chatgpt import chat_with_open_ai
  9 | from tools.file import create_file_with_keyword, append_content_to_file
 10 | from tools.logger import log_info, setup_logger
 11 | from tools.scraper import fetch_and_parse
 12 | from tools.serpapi import get_related_queries, get_image_with_commercial_usage, get_search_urls
 13 | from tools.storyblok import post_article_to_storyblok
 14 | from tools.subprocess import open_file_with_md_app
 15 | from tools.const import OPENAI_TEMPERATURE, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL
 16 | from tokencost import calculate_prompt_cost, calculate_completion_cost
 17 | 
 18 | # Step-to-Model Mapping: Define your model preferences here
 19 | step_to_model = {
 20 |     1: 'gpt-4-0125-preview', # Outline
 21 |     2: 'gpt-3.5-turbo', # Introduction
 22 |     3: 'gpt-4-0125-preview', # Body (...)
 23 |     4: 'gpt-4-0125-preview',
 24 |     5: 'gpt-4-0125-preview',
 25 |     6: 'gpt-4-0125-preview',
 26 |     7: 'gpt-4-0125-preview',
 27 |     8: 'gpt-4-0125-preview',
 28 |     9: 'gpt-4-0125-preview',
 29 |     10: 'gpt-4-0125-preview', # Conclusion
 30 |     11: 'gpt-3.5-turbo', # Related Posts
 31 |     12: 'gpt-3.5-turbo', # Meta Description
 32 |     13: 'gpt-3.5-turbo', # Title
 33 |     14: 'gpt-3.5-turbo', # Key Takeaways
 34 |     15: 'gpt-3.5-turbo', # ToC
 35 | }
 36 | 
 37 | 
 38 | steps_prompts = [
 39 |     # Step 1
 40 |     "Given the primary keywords - {primary_keywords}, the first step will be an outline of the article with 5-6 headings and respective subheadings. "
 41 |     "Take into consideration the summary of the first 10 search results for the keyword: {summary_of_search_results}"
 42 |     ,
 43 |      # Step 2
 44 |     "The second step is to write the introduction of the article, without any H2 title. Aim at 50-60 words, be concise yet impactful. "
 45 |     ,
 46 |     # Step 3
 47 |     "You will proceed to write the first point of the outline (if this point doesn't exist, simply don't respond). "
 48 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 49 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 50 |     "Whenever relevant highlight tools that can help the user, "
 51 |     "cover templates that allow the user to simply copy-paste " 
 52 |     "and include references to other websites if helpful for the user. "
 53 |     ,
 54 |     # Step 4
 55 |     "You will proceed to write the second point of the outline (if this point doesn't exist, simply don't respond). "
 56 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 57 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 58 |     "Whenever relevant highlight tools that can help the user, "
 59 |     "cover templates that allow the user to simply copy-paste " 
 60 |     "and include references to other websites if helpful for the user. "
 61 |     ,
 62 |     # Step 5
 63 |     "You will proceed to write the third point of the outline (if this point doesn't exist, simply don't respond). "
 64 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 65 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 66 |     "Whenever relevant highlight tools that can help the user, "
 67 |     "cover templates that allow the user to simply copy-paste " 
 68 |     "and include references to other websites if helpful for the user. "
 69 |     ,
 70 |     # Step 6
 71 |     "You will proceed to write the fourth point of the outline (if this point doesn't exist, simply don't respond). "
 72 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 73 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 74 |     "Whenever relevant highlight tools that can help the user, "
 75 |     "cover templates that allow the user to simply copy-paste " 
 76 |     "and include references to other websites if helpful for the user. "
 77 |     ,
 78 |     # Step 7
 79 |     "You will proceed to write the fifth point of the outline (if this point doesn't exist, simply don't respond). "
 80 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 81 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 82 |     "Whenever relevant highlight tools that can help the user, "
 83 |     "cover templates that allow the user to simply copy-paste " 
 84 |     "and include references to other websites if helpful for the user. "
 85 |     ,
 86 |     # Step 8
 87 |     "You will proceed to write the sixth point of the outline (if this point doesn't exist, simply don't respond). "
 88 |     "If applicable, explain step by step how to do the required actions for the user intent in the keyword provided. "
 89 |     "Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "
 90 |     "Whenever relevant highlight tools that can help the user, "
 91 |     "cover templates that allow the user to simply copy-paste " 
 92 |     "and include references to other websites if helpful for the user. "
 93 |     ,
 94 |     # Step 9
 95 |     "You will create a concisive conclusion paragraph, with H2 heading 'Conclusion'. "
 96 |     "Define the anchor link with the following format: ## <a name='h2-title'></a>H2 Title "
 97 |     ,
 98 |     # Step 10
 99 |     "You will create five unique Frequently Asked Questions (FAQs) after the conclusion. "
100 |     "The FAQs need to take the keyword into account at all times. "
101 |     "Make sure to add an anchor link to the H2 heading 'Frequently Asked Questions (FAQs)', with two new lines after the heading. "
102 |     "Define the anchor link with the following format: ## <a name='h2-title'></a>H2 Title "
103 |     "The FAQs should have the questions in H3 heading and the answers below (separated by a new line), "
104 |     "with the format: "
105 |     "### Question? "
106 |     "Answer"
107 |     ,
108 |     # Step 11
109 |     "Please create a related posts section (with H2 heading 'Related Posts'), with two new lines after the heading. "
110 |     "Include 3-4 articles that are the most relevant to this topic out of the existing blog posts described in the sitemap below: {sitemap_urls}. "
111 |     "The bullets should have the title of the article directly with the link to the article - e.g., in markdown [title](link)."
112 |     ,
113 |     # Step 12
114 |     "Please create a meta description (120-140 characters) for the article you just generated."
115 |     ,
116 |     # Step 13
117 |     "Please create 5 variations of a slightly click-baity (to invite the reader to click the link), SEO-optimized title (50-60 characters) for the article below. "
118 |     "Make sure to include the problem it is solving. Avoid futuristic and corporate type of words, phrase it as an How-To or even a Question. "
119 |     "The title should be in the format: 'Keyword - Subtitle', but only if the keyword fits well in the title. Don't use quotes or special characters in the title. "
120 |     "Present the titles in a single line (no bullets or numbers), each separated by a semicolon."
121 |     ,
122 |     # Step 14
123 |     "Create a Key Takeaways section summarising crucial points. "
124 |     "Make sure to use the H2 heading 'Key Takeaways' with two new lines after the heading. "
125 |     "The Key Takeaways should be in bullet format, with the format: "
126 |     "- Takeaway 1"
127 |     "\n- Takeaway 2"
128 |     ,
129 |     # Step 15
130 |     "Create a table of contents (ToC) for the article, only keeping H2 headings and excluding Key Takaways and Introduction. "
131 |     "Do not create a 'Table of Contents' H2 heading. "
132 |     "Make sure to include links to each section in the ToC, with the format: "
133 |     "[H2 Title](#h2-title)"
134 |     ,
135 | ]
136 | 
137 | def load_sitemap_and_extract_urls(sitemap_path):
138 |     # Parse the XML file
139 |     tree = ET.parse(sitemap_path)
140 |     root = tree.getroot()
141 | 
142 |     # Namespace, often found in sitemap files
143 |     namespace = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
144 | 
145 |     # Extract URLs
146 |     urls = [elem.text for elem in root.findall('ns:url/ns:loc', namespace)]
147 |     return urls
148 | 
149 | def generate_blog_for_keywords(primary_keywords="knee replacement surgery", service_name=SERVICE_NAME, service_description=SERVICE_DESCRIPTION, service_url=SERVICE_URL, post_to_storyblok=False):
150 |     # Iterate through each example
151 |     messages = []
152 |     payload = {"title": "", "metadescription": "", "intro": "", "body": "", "conclusion": "", "related_posts": "", "faqs": "", "keyword": primary_keywords, "key_takeaways": "", "toc": ""}
153 | 
154 |     filepath = create_file_with_keyword(primary_keywords)
155 |     log_info(f'🗂️  File Created {filepath}')
156 |     open_file_with_md_app(filepath)
157 | 
158 |     log_info(f'🎬 Primary Keywords: {primary_keywords}')
159 |     
160 |     summarized_contents = []
161 |     total_cost = 0
162 |     urls = get_search_urls(primary_keywords, number_of_results=10)
163 |     for url in urls:
164 |         content = fetch_and_parse(url)
165 |         if content:
166 |             # Summarize the content using OpenAI
167 |             summarisation_model = "gpt-3.5-turbo"
168 |             summary_prompt = f"Create a knowledge base of the maximum number of tools, templates and references, in 300 words or less: {content[:3000]}"
169 |             summary = chat_with_open_ai([{"role": "user", "content": summary_prompt}], model=summarisation_model) 
170 |             summarized_contents.append(summary)
171 |             prompt_cost = calculate_prompt_cost(summary_prompt, model=summarisation_model)
172 |             completion_cost = calculate_completion_cost(summary, model=summarisation_model)
173 |             total_cost += prompt_cost + completion_cost
174 |             
175 |     if summarized_contents:
176 |         concatenated_summaries = " ".join(summarized_contents)  # Combine all summaries into one large text
177 |         summary_of_search_results_prompt = f"Summarize the following content in 300 words or less, focusing on covering as many tools, templates and references as possible: {concatenated_summaries}"
178 |         summary_of_search_results = chat_with_open_ai([{"role": "user", "content": summary_of_search_results_prompt}], model=summarisation_model) 
179 |         log_info(f"Summary of search results: {summary_of_search_results}\nCost: {total_cost}")
180 |                     
181 |     # Create the system message with primary and secondary keywords
182 |     system_message_1 = f"You are an AI coding writing expert with vast experience in writing techniques and frameworks. "\
183 |                         f"As a skilled content creator, you will craft a 100% unique, human-written, and SEO-optimized article in fluent English that is informative. You will focus exclusively on the keyword provided by the user. "\
184 |                         f"You will write a long-form article that is well-structured and easy to read, with introduction, a body, a conclusion and five unique FAQs. "\
185 |                         f"Use John Gruber's Markdown to format your responses, with appropriate H2 to H4 tags. Use original content, avoid plagiarism, and increase readability. "\
186 |                         f"Avoid bullshitter talk, focus on being helpful, reliable and create people-first content to the reader, as per Google's guidelines: https://developers.google.com/search/docs/fundamentals/creating-helpful-content "\
187 |                         f"Make sure to follow Google's helpful content guidelines and EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) into the section creation process. "\
188 |                         f"You explain step by step how to do the required actions for the user intent in the keyword provided. Give examples of tools and explain how to use them. Always link to their websites to ensure the credit is given to them (include the links as anchors in the most relevant words). "\
189 |                         f"Use an informational tone and always reference the sources you use with the website links, including if you are referring to competitors of Emilio as we need to maximize collaboration in the network. "\
190 |                         f"Whenever relevant, introduce Emilio (with website https//getemil.io/)and explain to the user how Emilio can help them with their problem. "\
191 |                         f"Emilio’s is an AI-powered email client designed to save users time. Key functionalities include sorting prioritized emails, summarizing messages and digesting the inbox, drafting emails with the user's tone, and requiring no installation as it operates in the background. The service integrates with the user's existing Gmail account. "\
192 |                         f"The interaction with the user will take several steps below. You will take the necessary time in every step, and do one at a time to ensure the maximum quality possible."
193 | 
194 |     messages.append({"role": "system", "content": system_message_1})
195 | 
196 |     sitemap_path = 'sitemap.xml'
197 |     sitemap_urls = load_sitemap_and_extract_urls(sitemap_path)
198 | 
199 |     i = 1
200 |     total_words = 0
201 |     total_cost = 0
202 |     for step_prompt in steps_prompts:
203 |         # Pre-defined prompt
204 |         prompt = step_prompt.format(primary_keywords=primary_keywords, 
205 |                                     #tone_of_writing=tone_of_writing, 
206 |                                     service_name=service_name, 
207 |                                     service_description=service_description, 
208 |                                     service_url=service_url, 
209 |                                     sitemap_urls=sitemap_urls,
210 |                                     summary_of_search_results=summary_of_search_results
211 |                                     )
212 |         messages.append({"role": "user", "content": prompt})
213 | 
214 |         model = step_to_model.get(i, 'gpt-4-0125-preview')  # Fallback to a default model if not specified
215 |         prompt_cost = calculate_prompt_cost(messages, model)
216 |         
217 |         response = chat_with_open_ai(messages, model=model, temperature=OPENAI_TEMPERATURE)
218 |         completion_cost = calculate_completion_cost(response, model)
219 |         total_cost += prompt_cost + completion_cost
220 |         
221 |         messages.append({"role": "assistant", "content": response})
222 | 
223 |         # Don't append the response of the first step
224 |         if i > 1:
225 |             append_content_to_file(filepath, response, st if CLI else None)
226 |         log_info(f'🔺 ️Completed Step {i}. Words: {len(response.split(" "))}, Cost: {prompt_cost + completion_cost}')
227 |         
228 |         # Capture the response for each section
229 |         if i == 2:  # Assuming intro is captured here
230 |             payload['intro'] += response
231 |         elif 3 <= i <= 8:  # Assuming body is constructed here
232 |             payload['body'] += response + "\n"
233 |         elif i == 9:  # Conclusion
234 |             payload['conclusion'] += response
235 |         elif i == 10:  # FAQs
236 |             payload['faqs'] += response
237 |         elif i == 11:  # Related posts
238 |             payload['related_posts'] += response
239 |         elif i == 12:  # Meta description
240 |             payload['metadescription'] += response
241 |         elif i == 13:  # Title
242 |             payload['title'] += response
243 |         elif i == 14:  # Key Takeaway
244 |             payload['key_takeaways'] += response
245 |         elif i == 15:  # ToC
246 |             payload['toc'] += response
247 | 
248 |         i += 1
249 |         total_words += len(response.split(" "))
250 |     
251 |     # Reassemble the content according to the new order
252 |     ## Split titles string into a list of individual titles
253 |     titles_list = payload['title'].split(';')
254 |     ## Construct the titles section with HTML comment and Markdown headings
255 |     titles_section = "<!--- Suggested Titles / H1 (pick one each) -->\n"
256 |     for title in titles_list:
257 |         titles_section += f"# {title.strip()}\n"
258 |     titles_section += "\n"
259 |     
260 |     # Assemble the metadata
261 |     ## Assuming 'payload['title']' contains multiple titles separated by ";", choose the first one for the metadata
262 |     title_for_metadata = payload['title'].split(';')[0].strip()
263 |     ## Format the current date in YYYY-MM-DD format for the metadata
264 |     current_date = datetime.now().strftime("%Y-%m-%d")
265 |     ## Construct the metadata section
266 |     metadata_section = (
267 |         "---\n"
268 |         f"title: {title_for_metadata}\n"
269 |         f"date: {current_date}\n"
270 |         "authors:\n"
271 |         "  - name: \n"
272 |         "    title: \n"
273 |         "    picture: \n"
274 |         "tags: [\"1\", \"2\", \"2\", \"4\"]\n"
275 |         "---\n\n"
276 |     )
277 |     
278 |     ordered_content = (metadata_section + 
279 |         "<!--- Metadescription -->\n"  + payload['metadescription'] + "\n\n" +
280 |         titles_section + "\n" +
281 |         "<!--- Introduction -->\n" + payload['intro'] + "\n\n" +
282 |         "<!--- Table of Contents -->\n" + payload['toc'] + "\n\n" +
283 |         payload['key_takeaways'] + "\n\n" +
284 |         "<!--- Body -->\n" + payload['body'] + "\n\n" +
285 |         payload['conclusion'] + "\n\n" +
286 |         payload['related_posts'] + "\n\n" +
287 |         payload['faqs']
288 |     )
289 |     
290 |     # Now, instead of posting to Storyblok, first write to the file
291 |     with open(filepath, 'w') as file:
292 |         file.write(ordered_content)
293 | 
294 |     log_info(f'Total cost of operation: {total_cost}')
295 | 
296 |     # At the end of the loop, send the payload to Storyblok
297 |     if post_to_storyblok:
298 |         post_article_to_storyblok(payload)
299 | 
300 | def run_streamlit_app():
301 |     st.title("📝BLOGEN v0.2 (Blog Generation Algorithm)")
302 | 
303 |     # Add a text input field
304 |     input_text = st.text_input("Enter some text:")
305 | 
306 |     # Add a submit button
307 |     if st.button("Submit"):
308 |         # Execute the function with the input text
309 |         generate_blog_for_keywords(input_text)
310 | 
311 | 
312 | def run_terminal_app(keywords, post_to_storyblok=False):
313 |     generate_blog_for_keywords(keywords, SERVICE_NAME, SERVICE_DESCRIPTION, SERVICE_URL,post_to_storyblok)
314 | 
315 | 
316 | if __name__ == "__main__":
317 |     CLI = True
318 |     setup_logger()
319 |     
320 |     # Set up argument parser
321 |     parser = argparse.ArgumentParser(description="Generate a blog post and optionally post to Storyblok.")
322 |     parser.add_argument('--storyblok', action='store_true', help="If set, post the generated content to Storyblok.")
323 |     parser.add_argument('keywords', metavar='keyword', type=str, nargs='+', help='Keywords for generating the blog post.')
324 |     
325 |     # Parse arguments
326 |     args = parser.parse_args()
327 |     
328 |     # Extract options and keywords
329 |     post_to_storyblok = args.storyblok
330 |     keywords = " ".join(args.keywords)
331 | 
332 |     log_info('Starting BLOGEN...')
333 |     
334 |     if keywords.strip() == "":
335 |         print("Error: No keywords specified. Please provide at least one keyword.")
336 |     else:
337 |         run_terminal_app(keywords, post_to_storyblok=post_to_storyblok)
338 | 


--------------------------------------------------------------------------------
/others.py:
--------------------------------------------------------------------------------
 1 | from tools.chatgpt import chat_with_open_ai
 2 | from tools.storyblok import fetch_articles
 3 | import csv
 4 | 
 5 | def extract_text(content):
 6 |     text = ""
 7 |     if isinstance(content, dict):
 8 |         if 'content' in content:
 9 |             for item in content['content']:
10 |                 text += extract_text(item)
11 |         if 'text' in content:
12 |             text += content['text']
13 |     elif isinstance(content, list):
14 |         for item in content:
15 |             text += extract_text(item)
16 |     return text
17 | 
18 | def preprocess_title(title):
19 |     # Replace new lines with spaces or any other suitable character/formatting
20 |     return title.replace("\n", " ").strip()
21 | 
22 | def improve_articles_names(articles, csv_filename):
23 |     with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
24 |         csvwriter = csv.writer(csvfile)
25 |         # Writing the header of the CSV file
26 |         csvwriter.writerow(['Slug', 'Original Title', 'Suggested Title 1', 'Suggested Title 2', 'Suggested Title 3', 'Suggested Title 4', 'Suggested Title 5'])
27 |         
28 |         # Process only the first 3 articles for testing
29 |         for article in articles:
30 |             slug = article['slug'] 
31 |             name = article['name']
32 |             body = extract_text(article['content']['body'])
33 |             
34 |             prompt = f"""
35 |             Please create 5 variations of a slightly click-baity (to invite the reader to click the link), SEO-optimized title (50-60 characters) for the article below. 
36 |             The keyword associated with the article is '{slug}'.
37 |             Make sure to include the problem it is solving. Avoid futuristic and corporate type of words, phrase it as an How-To or even a Question. 
38 |             The title should be in the format: 'Keyword: Subtitle', but only if the keyword fits well in the title. Don't use quotes or special characters in the title. 
39 |             Present the titles in a single line (no bullets or numbers), each separated by a semicolon. Respond only with the titles, no need to include any other information.
40 |             \nArticle: \n{body}
41 |             """
42 |             
43 |             suggested_names_str = chat_with_open_ai([{"role": "user", "content": prompt}], model='gpt-3.5-turbo')
44 |             suggested_names = suggested_names_str.split(';')
45 |             
46 |             # Preprocess each suggested title to handle new lines
47 |             suggested_names = [preprocess_title(title) for title in suggested_names]
48 |             
49 |             # Ensure that there are exactly 5 suggested titles
50 |             suggested_names = suggested_names[:5] + [''] * (5 - len(suggested_names))
51 |             
52 |             csvwriter.writerow([slug, name] + suggested_names)
53 |             
54 | if __name__ == "__main__":
55 |     articles = fetch_articles()
56 |     improve_articles_names(articles, 'suggested_titles.csv')
57 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | # Generated on Tue Jul 25 13:00:17 IST 2023
 2 | aiohttp==3.8.5
 3 | aiosignal==1.3.1
 4 | altair==5.0.1
 5 | async-timeout==4.0.2
 6 | attrs==23.1.0
 7 | blinker==1.6.2
 8 | cachetools==5.3.1
 9 | certifi==2023.7.22
10 | charset-normalizer==3.2.0
11 | click==8.1.6
12 | decorator==5.1.1
13 | frozenlist==1.4.0
14 | gitdb==4.0.10
15 | GitPython==3.1.32
16 | google-search-results==2.4.2
17 | idna==3.4
18 | importlib-metadata==6.8.0
19 | Jinja2==3.1.2
20 | jsonschema==4.18.4
21 | jsonschema-specifications==2023.7.1
22 | markdown-it-py==3.0.0
23 | MarkupSafe==2.1.3
24 | mdurl==0.1.2
25 | md_toc==8.2.2
26 | multidict==6.0.4
27 | numpy==1.25.1
28 | openai==0.27.8
29 | packaging==23.1
30 | pandas==2.0.3
31 | Pillow==9.5.0
32 | protobuf==4.23.4
33 | pyarrow==12.0.1
34 | pydeck==0.8.0
35 | Pygments==2.15.1
36 | Pympler==1.0.1
37 | python-dateutil==2.8.2
38 | python-dotenv==1.0.0
39 | pytz==2023.3
40 | pytz-deprecation-shim==0.1.0.post0
41 | referencing==0.30.0
42 | requests==2.31.0
43 | rich==13.4.2
44 | rpds-py==0.9.2
45 | six==1.16.0
46 | smmap==5.0.0
47 | streamlit==1.25.0
48 | tenacity==8.2.2
49 | tokencost==0.1.2
50 | toml==0.10.2
51 | toolz==0.12.0
52 | tornado==6.3.2
53 | tqdm==4.65.0
54 | typing_extensions==4.7.1
55 | tzdata==2023.3
56 | tzlocal==4.3.1
57 | urllib3==2.0.4
58 | validators==0.20.0
59 | yarl==1.9.2
60 | zipp==3.16.2
61 | markdown==3.3.4
62 | serpapi==0.1.5
63 | beautifulsoup4==4.12.3
64 | 


--------------------------------------------------------------------------------
/tools/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rbatista191/blogen/b7490f80be17862eb30f7e9368d589d2b5215686/tools/__init__.py


--------------------------------------------------------------------------------
/tools/chatgpt.py:
--------------------------------------------------------------------------------
 1 | from time import sleep
 2 | 
 3 | import openai
 4 | 
 5 | from tools.const import OPENAI_API_KEY, OPENAI_MAX_TOKENS
 6 | from tools.logger import log_info
 7 | 
 8 | openai.api_key = OPENAI_API_KEY
 9 | 
10 | def chat_with_open_ai(conversation, model="gpt-4-0125-preview", temperature=0):
11 |     max_retry = 3
12 |     retry = 0
13 |     messages = [{'role': x.get('role', 'assistant'),
14 |                  'content': x.get('content', '')} for x in conversation]
15 |     while True:
16 |         try:
17 |             response = openai.ChatCompletion.create(model=model, messages=messages, temperature=temperature)
18 |             text = response['choices'][0]['message']['content']
19 | 
20 |             # trim message object
21 |             debug_object = [i['content'] for i in messages]
22 |             debug_object.append(text)
23 |             if response['usage']['total_tokens'] >= OPENAI_MAX_TOKENS:
24 |                 messages = split_long_messages(messages)
25 |                 if len(messages) > 1:
26 |                     messages.pop(1)
27 | 
28 |             return text
29 |         except Exception as oops:
30 |             log_info(f'Error communicating with OpenAI: "{oops}"')
31 |             if 'maximum context length' in str(oops):
32 |                 messages = split_long_messages(messages)
33 |                 if len(messages) > 1:
34 |                     messages.pop(1)
35 |                 log_info(' DEBUG: Trimming oldest message')
36 |                 continue
37 |             retry += 1
38 |             if retry >= max_retry:
39 |                 log_info(f"Exiting due to excessive errors in API: {oops}")
40 |                 return str(oops)
41 |             log_info(f'Retrying in {2 ** (retry - 1) * 5} seconds...')
42 |             sleep(2 ** (retry - 1) * 5)
43 | 
44 | 
45 | def split_long_messages(messages):
46 |     new_messages = []
47 |     for message in messages:
48 |         content = message['content']
49 |         if len(content.split()) > 1000:
50 |             # Split the content into chunks of 4096 tokens
51 |             chunks = [content[i:i + 1000] for i in range(0, len(content), 1000)]
52 | 
53 |             # Create new messages for each chunk
54 |             for i, chunk in enumerate(chunks):
55 |                 new_message = {'role': message['role'], 'content': chunk}
56 |                 if i == 0:
57 |                     # Replace the original message with the first chunk
58 |                     new_messages.append(new_message)
59 |                 else:
60 |                     # Append subsequent chunks as new messages
61 |                     new_messages.append({'role': message['role'], 'content': chunk})
62 |         else:
63 |             new_messages.append(message)  # No splitting required, add original message as it is
64 | 
65 |     return new_messages
66 | 


--------------------------------------------------------------------------------
/tools/const.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from dotenv import load_dotenv
 3 | 
 4 | load_dotenv()
 5 | 
 6 | BLOG_WRITING_TONES = [
 7 |     "Informative",
 8 |     "Conversational",
 9 |     "Inspirational/Motivational",
10 |     "Educational",
11 |     "Humorous",
12 |     "Thought-Provoking",
13 |     "Authoritative",
14 |     "Empathetic",
15 |     "Personal",
16 |     "Argumentative/Persuasive",
17 |     "Storytelling",
18 |     "Expository"
19 | ]
20 | 
21 | OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', '')
22 | OPENAI_MAX_TOKENS = int(os.getenv('OPENAI_MAX_TOKENS', 4000))
23 | OPENAI_TEMPERATURE = float(os.getenv('OPENAI_TEMPERATURE', 0.9))
24 | OPENAI_STOP_SEQ = os.getenv('OPENAI_STOP_SEQ', '\n')
25 | 
26 | STORYBLOK_MANAGEMENTAPI_TOKEN = os.getenv('STORYBLOK_MANAGEMENTAPI_TOKEN', '')
27 | STORYBLOK_CONTENTAPI_TOKEN = os.getenv('STORYBLOK_CONTENTAPI_TOKEN', '')
28 | STORYBLOK_SPACE_ID = os.getenv('STORYBLOK_SPACE_ID', '')
29 | 
30 | SERP_API_KEY = os.getenv('SERP_API_KEY', '')
31 | 
32 | SERVICE_NAME = os.getenv('SERVICE_NAME', '')
33 | SERVICE_DESCRIPTION = os.getenv('SERVICE_DESCRIPTION','')
34 | SERVICE_URL = os.getenv('SERVICE_URL', '')
35 | 
36 | 


--------------------------------------------------------------------------------
/tools/decision.py:
--------------------------------------------------------------------------------
 1 | from tools.chatgpt import chat_with_open_ai
 2 | from tools.logger import log_info
 3 | from tools.serpapi import get_latest_news
 4 | 
 5 | 
 6 | def find_tone_of_writing(primary_keywords, messages):
 7 |     new_messages = messages
 8 |     new_messages.append({"role": "user", "content": f"Which tones suites best in writing blog on {primary_keywords}? "
 9 |                                                     f"Give one word answer."})
10 |     tone_of_writing = chat_with_open_ai(new_messages, temperature=1)
11 |     return tone_of_writing
12 | 
13 | 
14 | def require_data_for_prompt(primary_keywords, next_prompt):
15 |     new_messages = [
16 |         {"role": "system", "content": "Act as an experienced SEO specialist and experienced content writer."},
17 |         {"role": "user", "content": f"You will be asked to respond to : {next_prompt} \n\n "
18 |                                     f"Would you require latest news on {primary_keywords}. "
19 |                                     f"Respond with either \"yes\" or \"no\"."}]
20 |     require_news = chat_with_open_ai(new_messages, temperature=1)
21 |     if "yes" in require_news.lower():
22 |         news_data = get_latest_news(primary_keywords, next_prompt)
23 |         log_info(f'🚨  Get News: {news_data}')
24 |         return news_data
25 |     else:
26 |         log_info(f'👷‍  No news')
27 |         return None
28 | 
29 | 
30 | def require_better_prompt(primary_keywords, next_prompt, messages):
31 |     new_messages = [
32 |         {"role": "system", "content": "Act as an experienced SEO specialist and experienced content writer."}
33 |     ]
34 |     for previous in messages:
35 |         if previous.get('role') == 'assistant':
36 |             new_messages.append(previous)
37 |     new_messages.append({"role": "user", "content": f"You will be asked to respond to : {next_prompt} \n\n "
38 |                                                     f"Can you suggest a better prompt judging the intent of user: {primary_keywords}. "
39 |                                                     f"Respond with only the prompt that you would ask a SEO specialist "
40 |                                                     f"or simply reply \"no\" if the given prompt is fine."})
41 |     better_prompt = chat_with_open_ai(new_messages, temperature=1)
42 |     if "no" in better_prompt.lower():
43 |         log_info(f'✅  Prompt Ok ')
44 |         return None
45 |     else:
46 |         log_info(f'🍀  Better Prompt: {better_prompt}')
47 |         return better_prompt
48 | 


--------------------------------------------------------------------------------
/tools/file.py:
--------------------------------------------------------------------------------
 1 | import datetime
 2 | import os
 3 | import sys
 4 | import re
 5 | 
 6 | from tools.logger import log_info
 7 | 
 8 | project_root = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
 9 | 
10 | # Add the project root directory to the Python path
11 | sys.path.append(project_root)
12 | 
13 | 
14 | def get_ordinal_suffix(number):
15 |     if 10 <= number % 100 <= 20:
16 |         suffix = "th"
17 |     else:
18 |         suffix = {1: "st", 2: "nd", 3: "rd"}.get(number % 10, "th")
19 |     return suffix
20 | 
21 | 
22 | def append_content_to_file(filename, new_content, st=None):
23 |     if st:
24 |         st.write(new_content)
25 | 
26 |     with open(filename, "a") as file:
27 |         # Add a newline before writing the new content
28 |         file.write("\n\n" + new_content)
29 | 
30 | 
31 | def create_file_with_keyword(keywords, directory="_blogs", extension="md"):
32 |     if not os.path.exists(directory):
33 |         os.makedirs(directory)
34 | 
35 |     processed_keywords = re.sub(r'[^\w\s]', ' ', keywords)
36 |     processed_keywords = processed_keywords.strip().lower()
37 | 
38 |     title = '_'.join(processed_keywords.split(' '))
39 |     subdirectory = os.path.join(directory, title)
40 |     if not os.path.exists(subdirectory):
41 |         os.makedirs(subdirectory)
42 | 
43 |     # Find the number of files starting with the same keyword
44 |     num_files_with_same_keyword = sum(1 for file in os.listdir(subdirectory) if file.startswith(title))
45 |     if num_files_with_same_keyword > 0:
46 |         ordinal_suffix = get_ordinal_suffix(num_files_with_same_keyword + 1)
47 |         filename = f"{title}-{num_files_with_same_keyword + 1}{ordinal_suffix}.{extension}"
48 |     else:
49 |         filename = f"{title}.{extension}"
50 | 
51 |     filepath = os.path.join(subdirectory, filename)
52 | 
53 |     # Create the new file
54 |     #with open(filepath, "w") as file:
55 |     #    file.write(f"Runtime: {datetime.datetime.now()}")
56 | 
57 |     return filepath
58 | 


--------------------------------------------------------------------------------
/tools/logger.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import logging
 3 | import logging.handlers
 4 | 
 5 | from logging.handlers import TimedRotatingFileHandler
 6 | 
 7 | 
 8 | class CustomFormatter(logging.Formatter):
 9 |     grey = "\x1b[38;5;240m"
10 |     yellow = "\x1b[38;5;226m"
11 |     red = "\x1b[38;5;196m"
12 |     bold_red = "\x1b[31;1m"
13 |     reset = "\x1b[0m"
14 | 
15 |     format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)"
16 | 
17 |     FORMATS = {
18 |         logging.DEBUG: grey + format + reset,
19 |         logging.INFO: grey + format + reset,
20 |         logging.WARNING: yellow + format + reset,
21 |         logging.ERROR: red + format + reset,
22 |         logging.CRITICAL: bold_red + format + reset
23 |     }
24 | 
25 |     def format(self, record):
26 |         log_fmt = self.FORMATS.get(record.levelno)
27 |         formatter = logging.Formatter(log_fmt)
28 |         return formatter.format(record)
29 | 
30 | 
31 | def setup_logger(filename='theblogsystem.log'):
32 |     logger = logging.getLogger()
33 |     logger.setLevel(logging.INFO)
34 | 
35 |     # Create console handler with a different formatter (not JSON)
36 |     ch = logging.StreamHandler()
37 |     ch.setLevel(logging.INFO)
38 |     color_formatter = CustomFormatter()
39 |     ch.setFormatter(color_formatter)
40 | 
41 |     # Create rotating file handler
42 |     subdirectory = '_logs'
43 |     if not os.path.exists(subdirectory):
44 |         os.makedirs(subdirectory)
45 |     filepath = os.path.join(subdirectory, filename)
46 | 
47 |     fh = TimedRotatingFileHandler(filepath, when='D', interval=1, backupCount=30)
48 |     fh.setLevel(logging.INFO)
49 | 
50 |     # FIX: Use JsonFormatter for 3rd Party Tools
51 |     # jfh = jsonlogger.JsonFormatter('%(asctime)s %(levelname)s %(message)s')
52 |     # jfh.setFormatter(file_formatter)
53 | 
54 |     fh.setFormatter(color_formatter)
55 | 
56 |     # Add the handlers to the logger
57 |     logger.addHandler(ch)
58 |     logger.addHandler(fh)
59 | 
60 | 
61 | def construct_log_message(message, *args, **kwargs):
62 |     log_message = message
63 |     if args:
64 |         log_message += f" - {args}"
65 |     if kwargs:
66 |         log_message += f" - {kwargs}"
67 |     return log_message
68 | 
69 | 
70 | def log_error(message, *args, **kwargs):
71 |     logger = logging.getLogger(__name__)
72 |     logger.error(construct_log_message(message, *args, **kwargs))
73 | 
74 | 
75 | def log_info(message, *args, **kwargs):
76 |     logger = logging.getLogger(__name__)
77 |     logger.info(construct_log_message(message, *args, **kwargs))
78 | 
79 | 
80 | def log_debug(message, *args, **kwargs):
81 |     logger = logging.getLogger(__name__)
82 |     logger.debug(construct_log_message(message, *args, **kwargs))
83 | 
84 | def log_warn(message, *args, **kwargs):
85 |     logger = logging.getLogger(__name__)
86 |     logger.warn(construct_log_message(message, *args, **kwargs))
87 | 


--------------------------------------------------------------------------------
/tools/scraper.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | def fetch_and_parse(url):
 5 |     try:
 6 |         response = requests.get(url, timeout=30)
 7 |         response.raise_for_status()  # Raises an HTTPError if the status is 4xx, 5xx
 8 |         soup = BeautifulSoup(response.text, 'html.parser')
 9 |         content = soup.find('body').get_text(separator='\n', strip=True)
10 |         return content
11 |     except Exception as e:
12 |         print(f"Failed to fetch {url}: {e}")
13 |         return None


--------------------------------------------------------------------------------
/tools/serpapi.py:
--------------------------------------------------------------------------------
  1 | from serpapi import GoogleSearch
  2 | 
  3 | from tools.chatgpt import chat_with_open_ai
  4 | from tools.const import SERP_API_KEY
  5 | from tools.logger import log_info
  6 | 
  7 | def get_search_urls(keyword, number_of_results=5):
  8 |     params = {
  9 |         "engine": "google",
 10 |         "q": keyword,
 11 |         "api_key": SERP_API_KEY,
 12 |     }
 13 |     search = GoogleSearch(params)
 14 |     results = search.get_dict()
 15 |     search_results = results.get("organic_results", [])
 16 |     urls = [result["link"] for result in search_results[:number_of_results]]
 17 |     return urls
 18 | 
 19 | 
 20 | def get_related_queries(keyword):
 21 |     params = {
 22 |         "engine": "google_trends",
 23 |         "q": keyword,
 24 |         "data_type": "RELATED_QUERIES",
 25 |         "api_key": SERP_API_KEY
 26 |     }
 27 | 
 28 |     search = GoogleSearch(params)
 29 |     results = search.get_dict()
 30 |     related_queries = results.get("related_queries", {})
 31 | 
 32 |     # Extract rising and top queries separately and combine them into a single list
 33 |     rising_queries = [query["query"] for query in related_queries.get("rising", [])]
 34 |     top_queries = [query["query"] for query in related_queries.get("top", [])]
 35 |     all_queries = rising_queries[:2] + top_queries[:2]
 36 | 
 37 |     return ", ".join(all_queries)
 38 | 
 39 | 
 40 | def get_latest_news(keywords, prompt):
 41 |     messages = [
 42 |         {"role": "user", "content": f"Act as an experienced SEO specialist and experienced content writer. "
 43 |                                     f"Given keywords - [{keywords}], and a prompt [{prompt}] "
 44 |                                     f"Find the necessary 2-3 keywords related to primary keywords "
 45 |                                     f"from the given prompt to search news from Google News. "
 46 |                                     f"Respond only with those keywords comma separated"}
 47 |     ]
 48 |     keywords = chat_with_open_ai(messages, temperature=1)
 49 | 
 50 |     log_info(f'🈂️  Keywords for news: {keywords}')
 51 |     params = {
 52 |         "engine": "google",
 53 |         "q": keywords,
 54 |         "tbm": "nws",
 55 |         "api_key": SERP_API_KEY
 56 |     }
 57 | 
 58 |     search = GoogleSearch(params)
 59 |     results = search.get_dict()
 60 | 
 61 |     # Extract news results and format them into a consumable string
 62 |     news_list = results.get("news_results", [])
 63 |     news_string = ""
 64 |     for news in news_list[:5]:
 65 |         title = news.get("title", "")
 66 |         link = news.get("link", "")
 67 |         source = news.get("source", "")
 68 |         published_date = news.get("published_date", "")
 69 |         summary = news.get("snippet", "")
 70 | 
 71 |         news_string += f"Title: {title}\n"
 72 |         news_string += f"Link: {link}\n"
 73 |         news_string += f"Source: {source}\n"
 74 |         news_string += f"Published Date: {published_date}\n"
 75 |         news_string += f"Summary: {summary}\n\n"
 76 | 
 77 |     return news_string
 78 | 
 79 | 
 80 | def get_image_with_commercial_usage(keywords, prompt, already_sourced):
 81 |     # messages = [
 82 |     #     {"role": "user", "content": f"Act as an experienced SEO specialist and experienced content writer. "
 83 |     #                                 f"Given primary keywords - [{keywords}], and a prompt [{prompt}] "
 84 |     #                                 f"Find the necessary 2-3 keywords related to primary keywords "
 85 |     #                                 f"from the given prompt to search images from Google Images. "
 86 |     #                                 f"Respond only with those keywords comma separated."}
 87 |     # ]
 88 |     # keywords = chat_with_open_ai(messages, temperature=1)
 89 |     # if "no" in keywords.lower():
 90 |     #     return None, already_sourced
 91 | 
 92 |     log_info(f'🏞️  Keywords for image: {keywords}')
 93 |     params = {
 94 |         "engine": "google",
 95 |         "q": keywords,
 96 |         "tbm": "isch",
 97 |         "tbs": "sur:fmc",
 98 |         "api_key": SERP_API_KEY
 99 |     }
100 |     search = GoogleSearch(params)
101 |     results = search.get_dict()
102 | 
103 |     image_results = results.get("images_results", [])
104 | 
105 |     for image in image_results:
106 |         image_source = image.get("source", "")
107 |         image_url = image.get("original", "")
108 |         if image_source in already_sourced or image_url in already_sourced:
109 |             continue
110 |         already_sourced.append(image_source)
111 |         already_sourced.append(image_url)
112 |         image_title = image.get("title", "")
113 |         image_content = f"![{image_title}]({image_url})\n Source: {image_source}\n\n"
114 |         return image_content, already_sourced
115 | 
116 |     return None, already_sourced
117 | 


--------------------------------------------------------------------------------
/tools/storyblok.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | from tools.const import STORYBLOK_MANAGEMENTAPI_TOKEN, STORYBLOK_CONTENTAPI_TOKEN, STORYBLOK_SPACE_ID
 4 | from tools.logger import log_info
 5 | import markdown
 6 | 
 7 | # Variables
 8 | mgmtapi_token = STORYBLOK_MANAGEMENTAPI_TOKEN
 9 | cntapi_token = STORYBLOK_CONTENTAPI_TOKEN
10 | space_id = STORYBLOK_SPACE_ID
11 | 
12 | 
13 | def post_article_to_storyblok(article_data):
14 |     url = f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/"
15 |     headers = {
16 |         "Content-Type": "application/json",
17 |         "Authorization": f"{mgmtapi_token}",
18 |     }
19 |     
20 |     # Convert Markdown fields to HTML
21 |     intro_html = markdown.markdown(article_data["intro"])
22 |     body_html = markdown.markdown(article_data["body"])
23 |     conclusion_html = markdown.markdown(article_data["conclusion"])
24 |     related_posts_html = markdown.markdown(article_data["related_posts"])
25 |     faqs_html = markdown.markdown(article_data["faqs"])
26 |     key_takeaways_html = markdown.markdown(article_data["key_takeaways"])
27 |     toc_html = markdown.markdown(article_data["toc"])
28 |     
29 |     # Structure the payload according to Storyblok's requirements
30 |     payload = {
31 |         "story": {
32 |             "name": article_data["title"],
33 |             "slug": article_data["keyword"].lower().replace(" ", "-").replace("_", "-"),
34 |             "content": {
35 |                 "component": "article",
36 |                 "title": article_data["title"],
37 |                 "metadescription": article_data["metadescription"],
38 |                 "intro": intro_html,
39 |                 "body": body_html,
40 |                 "conclusion": conclusion_html,
41 |                 "related_posts": related_posts_html,
42 |                 "faqs": faqs_html,
43 |                 "key_takeaways": key_takeaways_html,
44 |                 "toc": toc_html,
45 |             }
46 |         },
47 |     }
48 |     
49 |     # For creating a new story
50 |     response = requests.post(url, data=json.dumps(payload), headers=headers)
51 |     print(response)
52 | 
53 |     # For updating an existing story, use PUT request instead
54 |     # response = requests.put(f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/{story_id}", json=data, headers=headers)
55 |     
56 |     if response.status_code == 200 or response.status_code == 201:
57 |         log_info("Article posted successfully!")
58 |         return response.json()
59 |     else:
60 |         log_info(f"Failed to post article. Status code: {response.status_code}, Message: {response.text}")
61 |         return None
62 |     
63 | def fetch_articles():
64 |     url = f"https://api.storyblok.com/v1/cdn/stories?version=published&token={cntapi_token}&space_id={space_id}"
65 |     response = requests.get(url)
66 |     if response.status_code == 200:
67 |         articles = response.json().get('stories', [])
68 |         return articles  # Return all article components
69 |     else:
70 |         log_info(f"Failed to fetch articles. Status code: {response.status_code}, Message: {response.text}")
71 |         return []
72 | 
73 | def update_article_in_storyblok(article_id, article_data, slug=""):
74 |     url = f"https://mapi.storyblok.com/v1/spaces/{space_id}/stories/{article_id}"
75 |     headers = {
76 |         "Content-Type": "application/json",
77 |         "Authorization": f"{mgmtapi_token}",
78 |     }
79 |     
80 |     response = requests.put(url, data=json.dumps(article_data), headers=headers)
81 |     
82 |     if response.status_code == 200:
83 |         print(f"Article '{slug}' updated successfully!")
84 |         return response.json()
85 |     else:
86 |         print(f"Failed to update article. Status code: {response.status_code}, Message: {response.text}")
87 |         return None
88 | 


--------------------------------------------------------------------------------
/tools/subprocess.py:
--------------------------------------------------------------------------------
 1 | import subprocess
 2 | 
 3 | 
 4 | def open_file_with_md_app(filepath):
 5 |     try:
 6 |         subprocess.run(["open", "-a", "Macdown.app", filepath], check=True)
 7 |     except subprocess.CalledProcessError as e:
 8 |         print(f"Error occurred while opening the file: {e}")
 9 |     except FileNotFoundError:
10 |         print("The 'open' command is not available on this system (non-macOS).")
11 | 


--------------------------------------------------------------------------------