├── .env.example ├── .gitignore ├── LICENSE.txt ├── README.md ├── output.gif ├── output2.gif ├── reddit2md.py ├── reddit2txt.py ├── redditpost.py ├── requirements.txt ├── run-redditpost └── stehos-reddit-markdown-fabric.jpg /.env.example: -------------------------------------------------------------------------------- 1 | REDDIT_CLIENT_ID=your_reddit_client_id_here 2 | REDDIT_CLIENT_SECRET=your_reddit_client_secret_here 3 | REDDIT_USER_AGENT=script:reddit2text:v1.0 (by /u/your_reddit_username) # is typically a descriptive string that tells Reddit who is making the API request. It should follow a specific format to avoid getting blocked by Reddit -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | output/ 2 | .idea/ 3 | 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | 9 | # C extensions 10 | *.so 11 | 12 | # Distribution / packaging 13 | .Python 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | pip-wheel-metadata/ 27 | share/python-wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | MANIFEST 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .nox/ 47 | .coverage 48 | .coverage.* 49 | .cache 50 | nosetests.xml 51 | coverage.xml 52 | *.cover 53 | *.py,cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Django stuff: 62 | *.log 63 | local_settings.py 64 | db.sqlite3 65 | db.sqlite3-journal 66 | 67 | # Flask stuff: 68 | instance/ 69 | .webassets-cache 70 | 71 | # Scrapy stuff: 72 | .scrapy 73 | 74 | # Sphinx documentation 75 | docs/_build/ 76 | 77 | # PyBuilder 78 | target/ 79 | 80 | # Jupyter Notebook 81 | .ipynb_checkpoints 82 | 83 | # IPython 84 | profile_default/ 85 | ipython_config.py 86 | 87 | # pyenv 88 | .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 98 | __pypackages__/ 99 | 100 | # Celery stuff 101 | celerybeat-schedule 102 | celerybeat.pid 103 | 104 | # SageMath parsed files 105 | *.sage.py 106 | 107 | # Environments 108 | .env 109 | .venv 110 | env/ 111 | venv/ 112 | ENV/ 113 | env.bak/ 114 | venv.bak/ 115 | 116 | # Spyder project settings 117 | .spyderproject 118 | .spyproject 119 | 120 | # Rope project settings 121 | .ropeproject 122 | 123 | # mkdocs documentation 124 | /site 125 | 126 | # mypy 127 | .mypy_cache/ 128 | .dmypy.json 129 | dmypy.json 130 | 131 | # Pyre type checker 132 | .pyre/ 133 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Štefan Hosťovecký | weyou.sk 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Stehos Reddit Markdown With Fabric 2 | ![cover](./stehos-reddit-markdown-fabric.jpg) 3 | 4 | ## Description 5 | A simple tool to extract text from Reddit posts, convert them into markdown files, and process them with Fabric. The tool uses multiple Python scripts to achieve this. 6 | 7 | ### iTerm in action 8 | ![iPhone Shortcut](./output.gif) 9 | 10 | ### iOS Shortcut in action 11 | ![iPhone Shortcut](./output2.gif) 12 | 13 | ## Apple Shortcut for Reddit App Integration 14 | To launch this directly from your iPhone via SSH, use the following link: [Apple Shortcut - Reddit to Markdown](https://www.icloud.com/shortcuts/5c37b952172445238cccb42d7813d00b). 15 | 16 | ## CLI Usage 17 | To run the main script, use the following command: 18 | 19 | ```bash 20 | ./run-redditpost.sh [pattern_name] 21 | ``` 22 | 23 | ### Arguments 24 | - **reddit_url**: The URL of the Reddit post to process. 25 | - **output_directory**: Directory to save the generated markdown file. 26 | - **pattern_name** (optional): Fabric pattern name for summaries; defaults to `summarize`. Use `"0"` to skip Fabric processing. 27 | 28 | ### Script Insights 29 | **run-redditpost.sh** 30 | 31 | This shell script orchestrates the process by: 32 | - Validating input arguments. 33 | - Activating a Python virtual environment. 34 | - Running `redditpost.py` to retrieve Reddit text and convert it to markdown. 35 | - Adding Fabric summaries to the markdown file if a pattern name is provided (and not `"0"`). 36 | 37 | **redditpost.py** 38 | - Handles argument parsing for URL, output path, and pattern name. 39 | - Runs `reddit2txt.py` to fetch text content from Reddit. 40 | - Passes the output to `reddit2md.py` to convert text into a markdown file. 41 | 42 | **reddit2txt.py** 43 | - Uses Reddit API credentials to extract text from a specified Reddit post URL. 44 | 45 | Ensure you have the following environment variables for Reddit API access in your `.env` or shell environment: 46 | 47 | - `REDDIT_CLIENT_ID` 48 | - `REDDIT_CLIENT_SECRET` 49 | - `REDDIT_USER_AGENT` 50 | 51 | **reddit2md.py** 52 | - Converts the plain text from `reddit2txt.py` into markdown, including post title, author, upvotes, body, and comments. 53 | 54 | ## Requirements 55 | 1. [Fabric](https://github.com/danielmiessler/fabric) installed and configured for summaries. 56 | 2. Python 3.x with required packages (install via pip). 57 | 3. Reddit API credentials for access. 58 | 59 | ## Installation 60 | 61 | 1. Install Fabric and [generate](https://github.com/NFeruch/reddit2text) your Reddit API credentials. 62 | 2. Clone this repository to your server. 63 | 3. Create a virtual environment in your app folder: `python3 -m venv path/to/venv`. 64 | 4. Activate the environment via `source path/to/venv/bin/activate`. 65 | 5. Install dependencies: `pip3 install -r requirements.txt`. 66 | 6. Update `~/.bash_profile` or `~/.zshrc` to run the script from anywhere: 67 | - Add: `export PATH=$PATH:/path/to/your/cloned/folder`. 68 | 69 | Run the script with: 70 | 71 | ```bash 72 | run-redditpost.sh https://www.reddit.com/r/selfhosted/comments/1g8jytd/best_firewall_for_debian/ ./test 73 | ``` 74 | 75 | ### Notes 76 | To use a custom Fabric pattern, change the `pattern_name` argument in the `run-redditpost.sh` command. You can add new custom patterns in `~/.config/fabric/my_patterns`. 77 | 78 | ## License 79 | 80 | This project is licensed under the MIT License. See the [LICENSE](LICENSE.txt) file for details. 81 | 82 | ## Support 83 | 84 | [![Ko-Fi](https://img.shields.io/badge/Ko--fi-F16061?style=for-the-badge&logo=ko-fi&logoColor=white)](https://ko-fi.com/stehos) 85 | 86 | # Changelog - 30.10.2024 87 | 88 | ### Added 89 | - Support for an optional third argument `pattern_name` in `run-redditpost.sh`. This allows specifying a custom Fabric pattern for processing summaries, defaulting to `"summarize"` if no pattern is provided. 90 | - Condition to skip Fabric processing if `pattern_name` is set to `"0"`. 91 | - Updated usage instructions for `run-redditpost.sh` to reflect new syntax: 92 | ```bash 93 | ./run-redditpost.sh [pattern_name] 94 | ``` -------------------------------------------------------------------------------- /output.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/output.gif -------------------------------------------------------------------------------- /output2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/output2.gif -------------------------------------------------------------------------------- /reddit2md.py: -------------------------------------------------------------------------------- 1 | import re 2 | import os 3 | import argparse 4 | from slugify import slugify 5 | 6 | def ensure_directory_exists(folder): 7 | if not os.path.exists(folder): 8 | os.makedirs(folder) 9 | 10 | def generate_unique_filename(folder, title): 11 | # Generate a base filename from the title 12 | base_filename = slugify(title) + ".md" 13 | filepath = os.path.join(folder, base_filename) 14 | 15 | # Check if the file already exists, and append a number if necessary 16 | if os.path.exists(filepath): 17 | # If the file exists, append a number to the filename 18 | counter = 1 19 | while os.path.exists(filepath): 20 | # Create new filename with number appended 21 | new_filename = f"{slugify(title)}-{counter}.md" 22 | filepath = os.path.join(folder, new_filename) 23 | counter += 1 24 | 25 | return filepath 26 | 27 | # Function to parse the basic sections 28 | # Function to parse the basic sections with default values for missing fields 29 | def parse_basic_info(text): 30 | # Safely search for each section 31 | title_match = re.search(r'Title: (.*)', text) 32 | author_match = re.search(r'Author: (.*)', text) 33 | upvotes_match = re.search(r'Upvotes: (\d+)', text) 34 | body_match = re.search(r'Body text: (.*?)(\d+ Comments)', text, re.DOTALL) 35 | comments_match = re.search(r'(\d+) Comments:\n--------\n(.+)', text, re.DOTALL) 36 | 37 | # Extract values or set defaults 38 | title = title_match.group(1) if title_match else "Unknown Title" 39 | author = author_match.group(1) if author_match else "Unknown Author" 40 | upvotes = int(upvotes_match.group(1)) if upvotes_match else 0 41 | body_text = body_match.group(1).strip() if body_match else "No body text available." 42 | comment_count = int(comments_match.group(1)) if comments_match else 0 43 | raw_comments = comments_match.group(2).strip() if comments_match else "No comments available." 44 | 45 | return { 46 | "Title": title, 47 | "Author": author, 48 | "Upvotes": upvotes, 49 | "Body text": body_text, 50 | "Comment count": comment_count, 51 | "Raw comments": raw_comments 52 | } 53 | 54 | 55 | # Function to parse comments and threads 56 | def parse_comments(raw_comments): 57 | comments = [] 58 | pattern = re.compile(r'\| (.*?) \((\d+) upvotes\): (.*)') 59 | subcomment_pattern = re.compile(r'\| \| (.*?) \((\d+) upvotes\): (.*)') 60 | 61 | current_comment = None 62 | 63 | for line in raw_comments.splitlines(): 64 | match = pattern.match(line) 65 | sub_match = subcomment_pattern.match(line) 66 | 67 | if match: 68 | if current_comment: 69 | comments.append(current_comment) 70 | current_comment = { 71 | 'author': match.group(1), 72 | 'upvotes': int(match.group(2)), 73 | 'text': match.group(3), 74 | 'replies': [] 75 | } 76 | elif sub_match: 77 | reply = { 78 | 'author': sub_match.group(1), 79 | 'upvotes': int(sub_match.group(2)), 80 | 'text': sub_match.group(3), 81 | } 82 | if current_comment: 83 | current_comment['replies'].append(reply) 84 | 85 | if current_comment: 86 | comments.append(current_comment) 87 | 88 | return comments 89 | 90 | def save_as_markdown(parsed_data, comments, folder="markdown_files"): 91 | # Create folder if it doesn't exist 92 | ensure_directory_exists(folder) 93 | filepath = generate_unique_filename(folder, parsed_data["Title"]) 94 | 95 | # Start building the markdown content 96 | markdown_content = f"# {parsed_data['Title']}\n\n" 97 | markdown_content += f"**Author**: {parsed_data['Author']}\n" 98 | markdown_content += f"**Upvotes**: {parsed_data['Upvotes']}\n\n" 99 | markdown_content += f"## Body\n{parsed_data['Body text']}\n\n" 100 | markdown_content += f"## Comments ({parsed_data['Comment count']})\n" 101 | 102 | # Function to recursively format comments and replies 103 | def format_comment(comment, indent_level=0): 104 | indent = " " * indent_level 105 | comment_text = f"{indent}- **{comment['author']}** ({comment['upvotes']} upvotes): {comment['text']}\n" 106 | if comment['replies']: 107 | for reply in comment['replies']: 108 | comment_text += format_comment(reply, indent_level + 1) 109 | return comment_text 110 | 111 | # Add comments to markdown 112 | for comment in comments: 113 | markdown_content += format_comment(comment) 114 | 115 | # Write the markdown content to a file 116 | with open(filepath, "w", encoding="utf-8") as f: 117 | f.write(markdown_content) 118 | 119 | print(f"Markdown saved as: {filepath}") 120 | 121 | # Check for input text from command line arguments 122 | parser = argparse.ArgumentParser(description='Create markdown file') 123 | parser.add_argument('text', type=str, help='Content that markdown should be created from.') 124 | parser.add_argument('outputPath', type=str, nargs='?', default='markdown_files', help='The output path for your md file.') 125 | 126 | args = parser.parse_args() 127 | 128 | # Parsing the text 129 | basic_info = parse_basic_info(args.text) 130 | comments = parse_comments(basic_info['Raw comments']) 131 | 132 | save_as_markdown(basic_info, comments, args.outputPath) -------------------------------------------------------------------------------- /reddit2txt.py: -------------------------------------------------------------------------------- 1 | from reddit2text import Reddit2Text 2 | import argparse 3 | import os 4 | from dotenv import load_dotenv 5 | 6 | print("stehos") 7 | 8 | # Load environment variables from .env file 9 | load_dotenv() 10 | 11 | # Get Reddit API credentials from environment variables 12 | client_id = os.getenv('REDDIT_CLIENT_ID') 13 | client_secret = os.getenv('REDDIT_CLIENT_SECRET') 14 | user_agent = os.getenv('REDDIT_USER_AGENT') 15 | 16 | # Set up command-line argument parsing 17 | parser = argparse.ArgumentParser(description='Get text from a Reddit post.') 18 | parser.add_argument('url', type=str, help='The URL of the Reddit post to textualize.') 19 | 20 | args = parser.parse_args() 21 | 22 | r2t = Reddit2Text( 23 | client_id=client_id, 24 | client_secret=client_secret, 25 | user_agent=user_agent 26 | ) 27 | 28 | # Get the URL from the command-line argument 29 | URL = args.url 30 | 31 | # Textualize the Reddit post 32 | output = r2t.textualize_post(URL) 33 | print(output) 34 | -------------------------------------------------------------------------------- /redditpost.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import subprocess 3 | import argparse 4 | import os 5 | 6 | def main(url, outputPath): 7 | # Find the directory where this script is located 8 | script_dir = os.path.dirname(os.path.abspath(__file__)) 9 | 10 | # Paths to the other scripts (reddit2txt.py and reddit2md.py) 11 | reddit2txt_path = os.path.join(script_dir, 'reddit2txt.py') 12 | reddit2md_path = os.path.join(script_dir, 'reddit2md.py') 13 | 14 | try: 15 | # Run the first script (reddit2txt.py) 16 | result = subprocess.run(['python3', reddit2txt_path, url], 17 | capture_output=True, text=True, check=True) 18 | output = result.stdout.strip() # Get the output from the first script 19 | 20 | # Now call the second script (reddit2md.py) with the output 21 | subprocess.run(['python3', reddit2md_path, output, outputPath], check=True) 22 | 23 | except subprocess.CalledProcessError as e: 24 | print(f"An error occurred while running the script: {e}") 25 | sys.exit(1) 26 | 27 | if __name__ == "__main__": 28 | parser = argparse.ArgumentParser(description='Get text from a Reddit post.') 29 | parser.add_argument('url', type=str, help='The URL of the Reddit post to textualize.') 30 | parser.add_argument('outputPath', type=str, help='The output path for your md file.') 31 | args = parser.parse_args() 32 | main(args.url, args.outputPath) 33 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | python-dotenv==1.0.1 2 | python-slugify==8.0.4 3 | reddit2text==0.0.9 4 | -------------------------------------------------------------------------------- /run-redditpost: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Find the directory where this script is located 4 | SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" 5 | 6 | # Path to the virtual environment relative to the script's directory 7 | VENV_PATH="$SCRIPT_DIR/path/to/venv/bin/activate" 8 | 9 | # Path to the redditpost.py script 10 | PYTHON_SCRIPT="$SCRIPT_DIR/redditpost.py" 11 | 12 | # Check if at least two arguments are provided 13 | if [ "$#" -lt 2 ]; then 14 | echo "Usage: $0 [pattern_name]" 15 | exit 1 16 | fi 17 | 18 | # Extract the arguments 19 | reddit_url=$1 20 | output_dir=$2 21 | pattern_name=${3:-"summarize"} # Default pattern name is "summarize" if not provided 22 | 23 | # Use curl to follow redirects and print the final URL 24 | final_url=$(curl -Ls -o /dev/null -w %{url_effective} "$reddit_url") 25 | 26 | # Activate the virtual environment 27 | if [ -f "$VENV_PATH" ]; then 28 | source "$VENV_PATH" 29 | else 30 | echo "Error: Unable to find virtual environment at $VENV_PATH" 31 | exit 1 32 | fi 33 | 34 | # Run the Python script and capture its output 35 | if [ -f "$PYTHON_SCRIPT" ]; then 36 | output=$(python3 "$PYTHON_SCRIPT" "$final_url" "$output_dir") 37 | else 38 | echo "Error: redditpost.py script not found at $PYTHON_SCRIPT" 39 | exit 1 40 | fi 41 | 42 | echo "$output" 43 | 44 | # Extract the file path from the output 45 | file=$(echo "$output" | sed -n 's/^Markdown saved as: //p') 46 | 47 | # Ensure the file path was correctly captured 48 | if [ -z "$file" ]; then 49 | echo "Error: Unable to determine the saved markdown file." 50 | exit 1 51 | fi 52 | 53 | # Ensure the file exists before proceeding 54 | if [ ! -f "$file" ]; then 55 | echo "Error: Markdown file not found: $file" 56 | exit 1 57 | fi 58 | 59 | # Apply Fabric only if pattern_name is not "0" 60 | if [ "$pattern_name" != "0" ]; then 61 | echo "Processing Fabric with pattern '$pattern_name'... Please wait..." 62 | 63 | # Capture the output of Fabric directly from the file content 64 | fabric_output=$(fabric --pattern "$pattern_name" < "$file") 65 | 66 | # Prepend the Fabric output to the original file content 67 | original_content=$(cat "$file") 68 | echo -e "# Fabric Summary\n\n$fabric_output\n\n$original_content" > "$file" 69 | 70 | echo "Updated file with Fabric summary at the beginning: $file" 71 | else 72 | echo "No Fabric action applied as pattern name is '0'." 73 | fi 74 | -------------------------------------------------------------------------------- /stehos-reddit-markdown-fabric.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Stehos/stehos-reddit-markdown-with-fabric/b7adf7b9a305cdbb738dea8f24a414228a7d503e/stehos-reddit-markdown-fabric.jpg --------------------------------------------------------------------------------