├── .env.example
├── .gitignore
├── LICENSE.txt
├── README.md
├── output.gif
├── output2.gif
├── reddit2md.py
├── reddit2txt.py
├── redditpost.py
├── requirements.txt
├── run-redditpost
└── stehos-reddit-markdown-fabric.jpg


/.env.example:
--------------------------------------------------------------------------------
1 | REDDIT_CLIENT_ID=your_reddit_client_id_here
2 | REDDIT_CLIENT_SECRET=your_reddit_client_secret_here
3 | REDDIT_USER_AGENT=script:reddit2text:v1.0 (by /u/your_reddit_username) # is typically a descriptive string that tells Reddit who is making the API request. It should follow a specific format to avoid getting blocked by Reddit


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | output/
  2 | .idea/
  3 | 
  4 | # Byte-compiled / optimized / DLL files
  5 | __pycache__/
  6 | *.py[cod]
  7 | *$py.class
  8 | 
  9 | # C extensions
 10 | *.so
 11 | 
 12 | # Distribution / packaging
 13 | .Python
 14 | build/
 15 | develop-eggs/
 16 | dist/
 17 | downloads/
 18 | eggs/
 19 | .eggs/
 20 | lib/
 21 | lib64/
 22 | parts/
 23 | sdist/
 24 | var/
 25 | wheels/
 26 | pip-wheel-metadata/
 27 | share/python-wheels/
 28 | *.egg-info/
 29 | .installed.cfg
 30 | *.egg
 31 | MANIFEST
 32 | 
 33 | # PyInstaller
 34 | #  Usually these files are written by a python script from a template
 35 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 36 | *.manifest
 37 | *.spec
 38 | 
 39 | # Installer logs
 40 | pip-log.txt
 41 | pip-delete-this-directory.txt
 42 | 
 43 | # Unit test / coverage reports
 44 | htmlcov/
 45 | .tox/
 46 | .nox/
 47 | .coverage
 48 | .coverage.*
 49 | .cache
 50 | nosetests.xml
 51 | coverage.xml
 52 | *.cover
 53 | *.py,cover
 54 | .hypothesis/
 55 | .pytest_cache/
 56 | 
 57 | # Translations
 58 | *.mo
 59 | *.pot
 60 | 
 61 | # Django stuff:
 62 | *.log
 63 | local_settings.py
 64 | db.sqlite3
 65 | db.sqlite3-journal
 66 | 
 67 | # Flask stuff:
 68 | instance/
 69 | .webassets-cache
 70 | 
 71 | # Scrapy stuff:
 72 | .scrapy
 73 | 
 74 | # Sphinx documentation
 75 | docs/_build/
 76 | 
 77 | # PyBuilder
 78 | target/
 79 | 
 80 | # Jupyter Notebook
 81 | .ipynb_checkpoints
 82 | 
 83 | # IPython
 84 | profile_default/
 85 | ipython_config.py
 86 | 
 87 | # pyenv
 88 | .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 98 | __pypackages__/
 99 | 
100 | # Celery stuff
101 | celerybeat-schedule
102 | celerybeat.pid
103 | 
104 | # SageMath parsed files
105 | *.sage.py
106 | 
107 | # Environments
108 | .env
109 | .venv
110 | env/
111 | venv/
112 | ENV/
113 | env.bak/
114 | venv.bak/
115 | 
116 | # Spyder project settings
117 | .spyderproject
118 | .spyproject
119 | 
120 | # Rope project settings
121 | .ropeproject
122 | 
123 | # mkdocs documentation
124 | /site
125 | 
126 | # mypy
127 | .mypy_cache/
128 | .dmypy.json
129 | dmypy.json
130 | 
131 | # Pyre type checker
132 | .pyre/
133 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Štefan Hosťovecký | weyou.sk
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Stehos Reddit Markdown With Fabric
 2 | ![cover](./stehos-reddit-markdown-fabric.jpg)
 3 | 
 4 | ## Description
 5 | A simple tool to extract text from Reddit posts, convert them into markdown files, and process them with Fabric. The tool uses multiple Python scripts to achieve this.
 6 | 
 7 | ### iTerm in action
 8 | ![iPhone Shortcut](./output.gif)
 9 | 
10 | ### iOS Shortcut in action
11 | ![iPhone Shortcut](./output2.gif)
12 | 
13 | ## Apple Shortcut for Reddit App Integration
14 | To launch this directly from your iPhone via SSH, use the following link: [Apple Shortcut - Reddit to Markdown](https://www.icloud.com/shortcuts/5c37b952172445238cccb42d7813d00b).
15 | 
16 | ## CLI Usage
17 | To run the main script, use the following command:
18 | 
19 | ```bash
20 | ./run-redditpost.sh <reddit_url> <output_directory> [pattern_name]
21 | ```
22 | 
23 | ### Arguments
24 | - **reddit_url**: The URL of the Reddit post to process.
25 | - **output_directory**: Directory to save the generated markdown file.
26 | - **pattern_name** (optional): Fabric pattern name for summaries; defaults to `summarize`. Use `"0"` to skip Fabric processing.
27 | 
28 | ### Script Insights
29 | **run-redditpost.sh**
30 | 
31 | This shell script orchestrates the process by:
32 | - Validating input arguments.
33 | - Activating a Python virtual environment.
34 | - Running `redditpost.py` to retrieve Reddit text and convert it to markdown.
35 | - Adding Fabric summaries to the markdown file if a pattern name is provided (and not `"0"`).
36 | 
37 | **redditpost.py**
38 | - Handles argument parsing for URL, output path, and pattern name.
39 | - Runs `reddit2txt.py` to fetch text content from Reddit.
40 | - Passes the output to `reddit2md.py` to convert text into a markdown file.
41 | 
42 | **reddit2txt.py**
43 | - Uses Reddit API credentials to extract text from a specified Reddit post URL.
44 | 
45 | Ensure you have the following environment variables for Reddit API access in your `.env` or shell environment:
46 | 
47 | - `REDDIT_CLIENT_ID`
48 | - `REDDIT_CLIENT_SECRET`
49 | - `REDDIT_USER_AGENT`
50 | 
51 | **reddit2md.py**
52 | - Converts the plain text from `reddit2txt.py` into markdown, including post title, author, upvotes, body, and comments.
53 | 
54 | ## Requirements
55 | 1. [Fabric](https://github.com/danielmiessler/fabric) installed and configured for summaries.
56 | 2. Python 3.x with required packages (install via pip).
57 | 3. Reddit API credentials for access.
58 | 
59 | ## Installation
60 | 
61 | 1. Install Fabric and [generate](https://github.com/NFeruch/reddit2text) your Reddit API credentials.
62 | 2. Clone this repository to your server.
63 | 3. Create a virtual environment in your app folder: `python3 -m venv path/to/venv`.
64 | 4. Activate the environment via `source path/to/venv/bin/activate`.
65 | 5. Install dependencies: `pip3 install -r requirements.txt`.
66 | 6. Update `~/.bash_profile` or `~/.zshrc` to run the script from anywhere:
67 |    - Add: `export PATH=$PATH:/path/to/your/cloned/folder`.
68 | 
69 | Run the script with:
70 | 
71 | ```bash
72 | run-redditpost.sh https://www.reddit.com/r/selfhosted/comments/1g8jytd/best_firewall_for_debian/ ./test
73 | ```
74 | 
75 | ### Notes
76 | To use a custom Fabric pattern, change the `pattern_name` argument in the `run-redditpost.sh` command. You can add new custom patterns in `~/.config/fabric/my_patterns`.
77 | 
78 | ## License
79 | 
80 | This project is licensed under the MIT License. See the [LICENSE](LICENSE.txt) file for details.
81 | 
82 | ## Support
83 | 
84 | [![Ko-Fi](https://img.shields.io/badge/Ko--fi-F16061?style=for-the-badge&logo=ko-fi&logoColor=white)](https://ko-fi.com/stehos)
85 | 
86 | # Changelog - 30.10.2024
87 | 
88 | ### Added
89 | - Support for an optional third argument `pattern_name` in `run-redditpost.sh`. This allows specifying a custom Fabric pattern for processing summaries, defaulting to `"summarize"` if no pattern is provided.
90 | - Condition to skip Fabric processing if `pattern_name` is set to `"0"`.
91 | - Updated usage instructions for `run-redditpost.sh` to reflect new syntax:
92 |   ```bash
93 |   ./run-redditpost.sh <reddit_url> <output_directory> [pattern_name]
94 |   ```


--------------------------------------------------------------------------------
/output.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/output.gif


--------------------------------------------------------------------------------
/output2.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/output2.gif


--------------------------------------------------------------------------------
/reddit2md.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import os
  3 | import argparse
  4 | from slugify import slugify
  5 | 
  6 | def ensure_directory_exists(folder):
  7 |     if not os.path.exists(folder):
  8 |         os.makedirs(folder)
  9 | 
 10 | def generate_unique_filename(folder, title):
 11 |     # Generate a base filename from the title
 12 |     base_filename = slugify(title) + ".md"
 13 |     filepath = os.path.join(folder, base_filename)
 14 | 
 15 |     # Check if the file already exists, and append a number if necessary
 16 |     if os.path.exists(filepath):
 17 |         # If the file exists, append a number to the filename
 18 |         counter = 1
 19 |         while os.path.exists(filepath):
 20 |             # Create new filename with number appended
 21 |             new_filename = f"{slugify(title)}-{counter}.md"
 22 |             filepath = os.path.join(folder, new_filename)
 23 |             counter += 1
 24 | 
 25 |     return filepath
 26 | 
 27 | # Function to parse the basic sections
 28 | # Function to parse the basic sections with default values for missing fields
 29 | def parse_basic_info(text):
 30 |     # Safely search for each section
 31 |     title_match = re.search(r'Title: (.*)', text)
 32 |     author_match = re.search(r'Author: (.*)', text)
 33 |     upvotes_match = re.search(r'Upvotes: (\d+)', text)
 34 |     body_match = re.search(r'Body text: (.*?)(\d+ Comments)', text, re.DOTALL)
 35 |     comments_match = re.search(r'(\d+) Comments:\n--------\n(.+)', text, re.DOTALL)
 36 | 
 37 |     # Extract values or set defaults
 38 |     title = title_match.group(1) if title_match else "Unknown Title"
 39 |     author = author_match.group(1) if author_match else "Unknown Author"
 40 |     upvotes = int(upvotes_match.group(1)) if upvotes_match else 0
 41 |     body_text = body_match.group(1).strip() if body_match else "No body text available."
 42 |     comment_count = int(comments_match.group(1)) if comments_match else 0
 43 |     raw_comments = comments_match.group(2).strip() if comments_match else "No comments available."
 44 | 
 45 |     return {
 46 |         "Title": title,
 47 |         "Author": author,
 48 |         "Upvotes": upvotes,
 49 |         "Body text": body_text,
 50 |         "Comment count": comment_count,
 51 |         "Raw comments": raw_comments
 52 |     }
 53 | 
 54 | 
 55 | # Function to parse comments and threads
 56 | def parse_comments(raw_comments):
 57 |     comments = []
 58 |     pattern = re.compile(r'\| (.*?) \((\d+) upvotes\): (.*)')
 59 |     subcomment_pattern = re.compile(r'\| \| (.*?) \((\d+) upvotes\): (.*)')
 60 | 
 61 |     current_comment = None
 62 | 
 63 |     for line in raw_comments.splitlines():
 64 |         match = pattern.match(line)
 65 |         sub_match = subcomment_pattern.match(line)
 66 | 
 67 |         if match:
 68 |             if current_comment:
 69 |                 comments.append(current_comment)
 70 |             current_comment = {
 71 |                 'author': match.group(1),
 72 |                 'upvotes': int(match.group(2)),
 73 |                 'text': match.group(3),
 74 |                 'replies': []
 75 |             }
 76 |         elif sub_match:
 77 |             reply = {
 78 |                 'author': sub_match.group(1),
 79 |                 'upvotes': int(sub_match.group(2)),
 80 |                 'text': sub_match.group(3),
 81 |             }
 82 |             if current_comment:
 83 |                 current_comment['replies'].append(reply)
 84 | 
 85 |     if current_comment:
 86 |         comments.append(current_comment)
 87 | 
 88 |     return comments
 89 | 
 90 | def save_as_markdown(parsed_data, comments, folder="markdown_files"):
 91 |     # Create folder if it doesn't exist
 92 |     ensure_directory_exists(folder)
 93 |     filepath = generate_unique_filename(folder, parsed_data["Title"])
 94 | 
 95 |     # Start building the markdown content
 96 |     markdown_content = f"# {parsed_data['Title']}\n\n"
 97 |     markdown_content += f"**Author**: {parsed_data['Author']}\n"
 98 |     markdown_content += f"**Upvotes**: {parsed_data['Upvotes']}\n\n"
 99 |     markdown_content += f"## Body\n{parsed_data['Body text']}\n\n"
100 |     markdown_content += f"## Comments ({parsed_data['Comment count']})\n"
101 | 
102 |     # Function to recursively format comments and replies
103 |     def format_comment(comment, indent_level=0):
104 |         indent = "    " * indent_level
105 |         comment_text = f"{indent}- **{comment['author']}** ({comment['upvotes']} upvotes): {comment['text']}\n"
106 |         if comment['replies']:
107 |             for reply in comment['replies']:
108 |                 comment_text += format_comment(reply, indent_level + 1)
109 |         return comment_text
110 | 
111 |     # Add comments to markdown
112 |     for comment in comments:
113 |         markdown_content += format_comment(comment)
114 | 
115 |     # Write the markdown content to a file
116 |     with open(filepath, "w", encoding="utf-8") as f:
117 |         f.write(markdown_content)
118 | 
119 |     print(f"Markdown saved as: {filepath}")
120 | 
121 | # Check for input text from command line arguments
122 | parser = argparse.ArgumentParser(description='Create markdown file')
123 | parser.add_argument('text', type=str, help='Content that markdown should be created from.')
124 | parser.add_argument('outputPath', type=str, nargs='?', default='markdown_files', help='The output path for your md file.')
125 | 
126 | args = parser.parse_args()
127 | 
128 | # Parsing the text
129 | basic_info = parse_basic_info(args.text)
130 | comments = parse_comments(basic_info['Raw comments'])
131 | 
132 | save_as_markdown(basic_info, comments, args.outputPath)


--------------------------------------------------------------------------------
/reddit2txt.py:
--------------------------------------------------------------------------------
 1 | from reddit2text import Reddit2Text
 2 | import argparse
 3 | import os
 4 | from dotenv import load_dotenv
 5 | 
 6 | print("stehos")
 7 | 
 8 | # Load environment variables from .env file
 9 | load_dotenv()
10 | 
11 | # Get Reddit API credentials from environment variables
12 | client_id = os.getenv('REDDIT_CLIENT_ID')
13 | client_secret = os.getenv('REDDIT_CLIENT_SECRET')
14 | user_agent = os.getenv('REDDIT_USER_AGENT')
15 | 
16 | # Set up command-line argument parsing
17 | parser = argparse.ArgumentParser(description='Get text from a Reddit post.')
18 | parser.add_argument('url', type=str, help='The URL of the Reddit post to textualize.')
19 | 
20 | args = parser.parse_args()
21 | 
22 | r2t = Reddit2Text(
23 |     client_id=client_id,
24 |     client_secret=client_secret,
25 |     user_agent=user_agent
26 | )
27 | 
28 | # Get the URL from the command-line argument
29 | URL = args.url
30 | 
31 | # Textualize the Reddit post
32 | output = r2t.textualize_post(URL)
33 | print(output)
34 | 


--------------------------------------------------------------------------------
/redditpost.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import subprocess
 3 | import argparse
 4 | import os
 5 | 
 6 | def main(url, outputPath):
 7 |     # Find the directory where this script is located
 8 |     script_dir = os.path.dirname(os.path.abspath(__file__))
 9 | 
10 |     # Paths to the other scripts (reddit2txt.py and reddit2md.py)
11 |     reddit2txt_path = os.path.join(script_dir, 'reddit2txt.py')
12 |     reddit2md_path = os.path.join(script_dir, 'reddit2md.py')
13 | 
14 |     try:
15 |         # Run the first script (reddit2txt.py)
16 |         result = subprocess.run(['python3', reddit2txt_path, url],
17 |                                 capture_output=True, text=True, check=True)
18 |         output = result.stdout.strip()  # Get the output from the first script
19 | 
20 |         # Now call the second script (reddit2md.py) with the output
21 |         subprocess.run(['python3', reddit2md_path, output, outputPath], check=True)
22 | 
23 |     except subprocess.CalledProcessError as e:
24 |         print(f"An error occurred while running the script: {e}")
25 |         sys.exit(1)
26 | 
27 | if __name__ == "__main__":
28 |     parser = argparse.ArgumentParser(description='Get text from a Reddit post.')
29 |     parser.add_argument('url', type=str, help='The URL of the Reddit post to textualize.')
30 |     parser.add_argument('outputPath', type=str, help='The output path for your md file.')
31 |     args = parser.parse_args()
32 |     main(args.url, args.outputPath)
33 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | python-dotenv==1.0.1
2 | python-slugify==8.0.4
3 | reddit2text==0.0.9
4 | 


--------------------------------------------------------------------------------
/run-redditpost:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Find the directory where this script is located
 4 | SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 5 | 
 6 | # Path to the virtual environment relative to the script's directory
 7 | VENV_PATH="$SCRIPT_DIR/path/to/venv/bin/activate"
 8 | 
 9 | # Path to the redditpost.py script
10 | PYTHON_SCRIPT="$SCRIPT_DIR/redditpost.py"
11 | 
12 | # Check if at least two arguments are provided
13 | if [ "$#" -lt 2 ]; then
14 |   echo "Usage: $0 <reddit_url> <output_directory> [pattern_name]"
15 |   exit 1
16 | fi
17 | 
18 | # Extract the arguments
19 | reddit_url=$1
20 | output_dir=$2
21 | pattern_name=${3:-"summarize"}  # Default pattern name is "summarize" if not provided
22 | 
23 | # Use curl to follow redirects and print the final URL
24 | final_url=$(curl -Ls -o /dev/null -w %{url_effective} "$reddit_url")
25 | 
26 | # Activate the virtual environment
27 | if [ -f "$VENV_PATH" ]; then
28 |   source "$VENV_PATH"
29 | else
30 |   echo "Error: Unable to find virtual environment at $VENV_PATH"
31 |   exit 1
32 | fi
33 | 
34 | # Run the Python script and capture its output
35 | if [ -f "$PYTHON_SCRIPT" ]; then
36 |   output=$(python3 "$PYTHON_SCRIPT" "$final_url" "$output_dir")
37 | else
38 |   echo "Error: redditpost.py script not found at $PYTHON_SCRIPT"
39 |   exit 1
40 | fi
41 | 
42 | echo "$output"
43 | 
44 | # Extract the file path from the output
45 | file=$(echo "$output" | sed -n 's/^Markdown saved as: //p')
46 | 
47 | # Ensure the file path was correctly captured
48 | if [ -z "$file" ]; then
49 |   echo "Error: Unable to determine the saved markdown file."
50 |   exit 1
51 | fi
52 | 
53 | # Ensure the file exists before proceeding
54 | if [ ! -f "$file" ]; then
55 |   echo "Error: Markdown file not found: $file"
56 |   exit 1
57 | fi
58 | 
59 | # Apply Fabric only if pattern_name is not "0"
60 | if [ "$pattern_name" != "0" ]; then
61 |   echo "Processing Fabric with pattern '$pattern_name'... Please wait..."
62 | 
63 |   # Capture the output of Fabric directly from the file content
64 |   fabric_output=$(fabric --pattern "$pattern_name" < "$file")
65 | 
66 |   # Prepend the Fabric output to the original file content
67 |   original_content=$(cat "$file")
68 |   echo -e "# Fabric Summary\n\n$fabric_output\n\n$original_content" > "$file"
69 | 
70 |   echo "Updated file with Fabric summary at the beginning: $file"
71 | else
72 |   echo "No Fabric action applied as pattern name is '0'."
73 | fi
74 | 


--------------------------------------------------------------------------------
/stehos-reddit-markdown-fabric.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/stehos-reddit-markdown-fabric.jpg


--------------------------------------------------------------------------------