├── .github └── workflows │ └── main.yml ├── .gitignore ├── ARCHITECTURE.md ├── README.md ├── dashboard.py ├── fossil_mastodon ├── __init__.py ├── algorithm.py ├── app │ ├── index.js │ ├── static │ │ ├── htmx.js │ │ ├── logo-light.svg │ │ ├── page.js │ │ ├── style.css │ │ └── work-in-progress.gif │ └── templates │ │ ├── bad_plugin.html │ │ ├── base │ │ ├── admin.html │ │ └── page.html │ │ ├── index.html │ │ ├── no_algorithm.html │ │ ├── settings.html │ │ ├── toot.html │ │ ├── toot_clusters.html │ │ └── toot_list.html ├── config.py ├── core.py ├── migrations.py ├── plugin_impl │ ├── __init__.py │ ├── toot_debug.py │ └── topic_cluster.py ├── plugins.py ├── science.py ├── server.py └── ui.py ├── index.html ├── make.sh ├── poetry.lock └── pyproject.toml /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: Upload Python Package to PyPI 2 | 3 | on: 4 | release: 5 | types: [published] 6 | 7 | permissions: 8 | contents: read 9 | 10 | jobs: 11 | deploy: 12 | 13 | runs-on: ubuntu-latest 14 | 15 | steps: 16 | - uses: actions/checkout@v3 17 | - name: Set up Python 18 | uses: actions/setup-python@v3 19 | with: 20 | python-version: '3.x' 21 | - name: Install dependencies 22 | run: | 23 | python -m pip install --upgrade pip 24 | pip install build 25 | - name: Build package 26 | run: python -m build 27 | - name: Publish package 28 | uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29 29 | with: 30 | user: __token__ 31 | password: ${{ secrets.PYPI_API_TOKEN }} 32 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | .env 3 | fossil.db 4 | dist/ 5 | -------------------------------------------------------------------------------- /ARCHITECTURE.md: -------------------------------------------------------------------------------- 1 | 2 | # It's a *Client* 3 | 4 | Fossil is a Mastodon **client**. While there is a web server that keeps state, it only downloads your home timeline and redisplays it. It doesn't touch posts from accounts or hashtags that you don't follow. 5 | 6 | ```mermaid 7 | graph TD 8 | subgraph fediverse 9 | me[My Mastodon Server] 10 | them[Mastodon Server] 11 | them2[Mastodon Server] 12 | end 13 | them2-->me 14 | them-->me 15 | me-->fossil["Fossil (on my laptop)"]-->phone[My Phone] 16 | ``` 17 | 18 | ## How Do I Connect My Phone? 19 | 20 | Traditionally, the way you'd do this would be to deploy fossil into a cloud environment. The problem is that, if it's running on my laptop it doesn't have a public IP address, so by deploying it in the cloud 21 | I can leverage infrastructure in AWS or Azure to serve it with a public IP address and/or a full domain name. 22 | 23 | ### Option 1: Tailscale 24 | [Tailscale](https://tailscale.com/kb/1017/install?slug=kb&slug=1017&slug=install) sets up a mesh network. It's similar to a corporate VPN, except that it's so easy to install that you can trivially pop it 25 | onto your phone, laptop, home lab, cloud lab, IoT devices, etc. 26 | 27 | The setup is: 28 | 29 | 1. Follow tailscale's directions 30 | 2. Install tailscale on your laptop 31 | 3. Install tailscale on your phone 32 | 4. Run fossil on your laptop, remember the port 33 | 5. Lookup your laptop's hostname 34 | 6. Open up safari/chrome/firefox/etc. and paste in the hostname and port: `http://{hostname}:{port}` 35 | 36 | This works seamlessly for yourself and potentially sharing with a small number of friends and family. 37 | 38 | ### Option 2: ngrok 39 | [Ngrok](https://ngrok.com/docs/getting-started/) works a bit different from tailscale. It gives you a public domain name to reach your app, and then tunnels traffic through to your laptop 40 | 41 | ```mermaid 42 | graph LR 43 | Phone 44 | subgraph internet 45 | ngrok[ngrok domain name] 46 | end 47 | Phone-->ngrok 48 | fossil[fossil on laptop] 49 | ngrok --> fossil 50 | ``` 51 | 52 | This works better if you want to share it with a lot of people. You can easily tack on authentication, like OAuth. 53 | 54 | 55 | # Code Architecture 56 | It's a fairly standard htmx-on-python arrangement 57 | 58 | ```mermaid 59 | graph TD 60 | SQLite[(SQLite)]-->FastAPI[FastAPI on Python] 61 | llm-->FastAPI 62 | FastAPI-->HTML[HTML w/ htmx tags] 63 | ``` 64 | 65 | SQLite stores: 66 | - Toots 67 | - `id`: This is an internal auto-incrementing ID. Not the same as `toot_id` 68 | - Some other fields parsed from JSON 69 | - `embedding`: The embedding vector, stored as a BLOB. In memory it's kept as a numpy array. 70 | - `orig_json`: The full unaltered JSON that the mastodon server sent us, stored as TEXT 71 | - Session 72 | - `id`: The session ID. This is stored in an HTTP cookie when sent to the browser, so all requests can correspond to a session. 73 | - `algorithm_spec`: A JSON object (stored as TEXT) describing the module & class name of the algorithm currently in use. 74 | - `algorithm`: The algorithm serialized via [pickle](https://docs.python.org/3/library/pickle.html), stored as a BLOB. This enables 75 | pluggable algorithms to keep their own state persistently. 76 | 77 | HTTP Cookies 78 | - `fossil_session_id`: The primary key of the `sessions` table. Created whenever it's empty, never expires. 79 | 80 | ## Code Layout 81 | 82 | - `core.py`: Database access, downloading toots, etc. 83 | - `config.py`: Configuration & wrappers around configuration mechanisms. All config should have either a constant or simple function. 84 | - [DEPRECATED] `science.py`: Functionality here has been moved to `algorithm/topic_cluster.py` and made more pluggable. 85 | - `server.py`: Entry point. FastAPI app with all core HTTP operations defined. Operations return either a jinja template or a literal HTML response. 86 | - `ui.py`: partially deprecated (it contains old streamlit code). 87 | - `algorithm/` 88 | - `base.py`: Base classes and utilities needed for building algorithm plugins. All algorithms are installed as plugins, even standard ones. 89 | - Remaining files: algorithms, each implementing base classes from `base.py`. 90 | - `app/` 91 | - `static/`: various CSS & JavaScript files 92 | - `style.css`: the only CSS we're writing manually 93 | - `page.js`: The only JS we're writing manually. No pre or post processing pipeline, it's downloaded literally as it's stored in Git, comments and all. 94 | - Other files: Things I downloaded 95 | - `templates/`: Jinja templates 96 | - `index.html`: Returned by `GET /` 97 | - `settings.html`: Returned by `GET /settings` 98 | - `toot*.html`: Different sub-templates included into `index.html` or returned from XHR endpoints. You can use these for building plugins. 99 | - `base/` 100 | - `page.html`: Base template that is inherited by both `index.html` and `settings.html` 101 | 102 | 103 | 104 | # Plugin Architecture 105 | ## Making an Algorithm Plugin 106 | An algorithm plugin involves: 107 | 108 | 1. Algorithm class 109 | 2. [Optional] Renderer class 110 | 3. [Optional] Jinja templates for displaying 111 | 112 | All algorithms are plugins, so you can use [`topic_cluster.py`](https://github.com/tkellogg/fossil/blob/main/fossil_mastodon/algorithm/topic_cluster.py) 113 | as a guide. 114 | 115 | ### Algorithm Class 116 | Use `base.BaseAlgorithm` as a base class, implement these methods: 117 | 118 | - `render(toots, render_context)`: Convert a list of toots into a Renderer object (which converts to an HTTP response) 119 | - `train(toots, train_context, args)`: Produces an instance of your algorithm class. The assumption is that you're training 120 | some sort of model, e.g. topic_cluster trains a sklearn `KMeansCluster` model and stores it in a field of the `TopicCluster` 121 | object. By storing it in a field, it ensures that the algorithm is serialized to and from the database. 122 | 123 | ### [Optional] Renderer Class 124 | You might not need to do this if you can find a different template & renderer that works for you. This should be very easy to implement, it's just a 125 | matter of capturing the data you need and then passing it to a template. 126 | 127 | Use `base.Renderable` as a base class, implement these methods: 128 | 129 | - `render()`: Returns a FastAPI response. Typically you're going to return a TemplateResponse 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fossil, a Mastodon Client for Reading 2 | 3 | A mastodon client optimized for reading, with a configurable and 4 | hackable timeline algorithm powered by Simon Wilison's [llm](https://llm.datasette.io/en/stable/index.html) tool. Try making your own algorithm! 5 | 6 | 7 | Sneak peek: 8 | 9 | ![image](https://gist.github.com/assets/437044/3152e5ae-bac7-4a67-a01b-82a712e90e26) 10 | 11 | 12 | # Installing & Running 13 | 14 | ## From PyPi 15 | 16 | I highly suggest not installing any Python app directly into your global Python. Create a virtual environment: 17 | 18 | ``` 19 | python -m venv fossil 20 | ``` 21 | 22 | And then activate it (see [here](https://docs.python.org/3/library/venv.html)) 23 | 24 | ``` 25 | source fossil/bin/activate 26 | ``` 27 | 28 | Alternatively, **use [`pipx`](https://pipx.pypa.io/stable/installation/)**: 29 | 30 | ``` 31 | pip install pipx 32 | pipx install fossil-mastodon 33 | ``` 34 | 35 | ## From Source 36 | 37 | Clone this repo: 38 | 39 | ``` 40 | git clone https://github.com/tkellogg/fossil.git 41 | ``` 42 | 43 | And then `cd fossil` to get into the correct directory. 44 | 45 | 46 | ## Configure the `.env` file 47 | 48 | Before that, you'll need a `.env` file with these keys: 49 | 50 | ``` 51 | ACCESS_TOKEN= 52 | ``` 53 | 54 | Alternatively, you can set them as environment variables. All available keys are here: 55 | 56 | | Variable | Required? | Value | 57 | | --- | --- | --- | 58 | | OPENAI_API_BASE | no | eg. https://api.openai.com/v1 | 59 | | MASTO_BASE | no? | eg. https://hackyderm.io | 60 | | ACCESS_TOKEN | yes | In your mastodon UI, create a new "app" and copy the access token here | 61 | 62 | ### Connecting to Mastodon 63 | 64 | To get `MASTO_BASE` and `ACCESS_TOKEN`: 65 | 66 | 1. Go to Mastodon web UI 67 | 2. Preferences -> Development 68 | 3. Click "New Application" 69 | 4. Set the name 70 | 5. Set "Redirect URI" to `urn:ietf:wg:oauth:2.0:oob` 71 | 6. Set scopes to all `read` and `write` (contribution idea: figure out what's strictly necessary and send a pull request to update this) 72 | 7. Click Submit 73 | 8. Copy your access token into `ACCESS_TOKEN` in the `.env` file. 74 | 9. Set `MAST_BASE`. You should be able to copy the URL from your browser and then remove the entire path (everything after `/`, inclusive). 75 | 76 | ## Usage 77 | 1. Ensure the settings are correct 78 | 2. "Load More" to populate the database with toots 79 | 3. "Re-Train Algorithm" to categorize and label those toots. 80 | 81 | # Configure Models 82 | Models can be configured and/or added via `llm`. 83 | 84 | ## OpenAI 85 | Here's how to set your OpenAI API key, which gives you access to OpenAI models: 86 | 87 | ``` 88 | $ llm keys set openai 89 | Enter key: ... 90 | ``` 91 | ## Local (Experimental) 92 | You will need to install an embedding model and a large language model. The instructions here use the `llm-sentence-transformers` and `llm-gpt4all` plugins to do so. 93 | 94 | ```sh 95 | $ llm install llm-sentence-transformers # An Embedding Model Plugin 96 | $ llm install llm-gpt4all # A Large Language Model Plugin 97 | $ llm sentence-transformers register all-mpnet-base-v2 --alias mpnet # Download/Register one of the Embedding Models 98 | ``` 99 | 100 | ### Notes 101 | - A full list of possible embedding models is composed of [the default list](https://www.sbert.net/docs/pretrained_models.html) and [these models from huggingface](https://huggingface.co/models?library=sentence-transformers). 102 | - The [llm-gpt4all](https://github.com/simonw/llm-gpt4all) README gives a list of models and their requirements 103 | - The first time you use a model, `llm` will need to download it. This will add to the overall time it takes to process 104 | - The "Re-Train Algorithm" step will take a long time depending on your hardware; a progress bar is shown in the console window 105 | - The quality of the categorization and labels are not guaranteed 106 | 107 | ## Run the server 108 | 109 | If you installed from PyPi: 110 | 111 | ``` 112 | uvicorn --host 0.0.0.0 --port 8888 fossil_mastodon.server:app 113 | ``` 114 | 115 | If you installed from source: 116 | 117 | ``` 118 | poetry run uvicorn --host 0.0.0.0 --port 8888 --reload fossil_mastodon.server:app 119 | ``` 120 | 121 | If you're working on CSS or HTML files, you should include them: 122 | 123 | ``` 124 | poetry run uvicorn --host 0.0.0.0 --port 8888 --reload --reload-include '*.html' --reload-include '*.css' fossil_mastodon.server:app 125 | ``` 126 | 127 | (Note the `--reload` makes it much easier to develop, but is generally unneccessary if you're not developing) -------------------------------------------------------------------------------- /dashboard.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | from fossil_mastodon import config, core, science, ui 3 | import streamlit as st 4 | import datetime 5 | import random 6 | 7 | st.title("fossil") 8 | link_style = ui.LinkStyle() 9 | 10 | @st.cache_data 11 | def default_date(): 12 | return datetime.datetime.utcnow() - datetime.timedelta(days=1) 13 | 14 | @st.cache_data 15 | def get_toots(_cache_key: int, timeline_since, n_clusters) -> list[core.Toot]: 16 | print("get_toots", _cache_key, st.session_state.cache_key, "since=", datetime.datetime.utcnow() - timeline_since) 17 | toots = core.Toot.get_toots_since(datetime.datetime.utcnow() - timeline_since) 18 | if len(toots) > 0: 19 | ui.all_toot_summary(toots) 20 | science.assign_clusters(st.session_state['id'], toots, n_clusters=n_clusters) 21 | return toots 22 | 23 | # Refresh button 24 | latest_date = core.Toot.get_latest_date() 25 | if latest_date is None: 26 | is_refreshing = st.button("Download toots") 27 | if is_refreshing: 28 | with st.spinner("Downloading toots..."): 29 | core.create_database() 30 | core.download_timeline(datetime.datetime.utcnow() - datetime.timedelta(days=1), st.session_state['id']) 31 | latest_date = core.Toot.get_latest_date() 32 | st.session_state.cache_key = random.randint(0, 10000) 33 | else: 34 | is_refreshing = st.button("Refresh toots") 35 | if is_refreshing: 36 | with st.spinner("Downloading toots..."): 37 | core.create_database() 38 | core.download_timeline(latest_date, st.session_state['id']) 39 | st.session_state.cache_key = random.randint(0, 10000) 40 | 41 | # customize timeline segment to analyze 42 | timeline_since = ui.get_time_frame() 43 | 44 | # customize clustering algo 45 | n_clusters = st.slider("Number of clusters", 2, 20, 15) 46 | 47 | if "cache_key" not in st.session_state: 48 | print("init cache_key", st.session_state) 49 | st.session_state.cache_key = random.randint(0, 10000) 50 | 51 | if st.button("Show"): 52 | st.session_state.cache_key = random.randint(0, 10000) 53 | 54 | print(f"state: {st.session_state.cache_key}") 55 | 56 | toots = get_toots(st.session_state.cache_key, timeline_since, n_clusters) 57 | clusters = sorted(list({t.cluster for t in toots if t.cluster})) 58 | if len(toots) == 0: 59 | st.markdown("No toots found. Try clicking **Download toots** or **Refresh toots** above and then click **Show**.") 60 | else: 61 | for cluster in clusters: 62 | cluster_count = len([t for t in toots if t.cluster == cluster]) 63 | with st.expander(f"{cluster} ({cluster_count} toots)"): 64 | for toot in toots: 65 | if toot.cluster == cluster: 66 | ui.display_toot(toot, link_style) 67 | -------------------------------------------------------------------------------- /fossil_mastodon/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tkellogg/fossil/89db2fdbea96666f101e6e16e287fa23fcee0b9d/fossil_mastodon/__init__.py -------------------------------------------------------------------------------- /fossil_mastodon/algorithm.py: -------------------------------------------------------------------------------- 1 | import abc 2 | import datetime 3 | import pickle 4 | import sqlite3 5 | import typing 6 | 7 | import pydantic 8 | from fastapi import Response, responses 9 | 10 | from fossil_mastodon import config, core 11 | if typing.TYPE_CHECKING: 12 | from fossil_mastodon import plugins 13 | 14 | 15 | class Renderable(abc.ABC): 16 | """ 17 | A base class for a "shape" of data to be rendered as HTML. 18 | """ 19 | @abc.abstractmethod 20 | def render(self, **response_args) -> Response: 21 | """ 22 | Render this object as a FastAPI Response. 23 | 24 | :param response_args: Additional arguments to pass to the Response constructor. 25 | """ 26 | raise NotImplementedError() 27 | 28 | 29 | class TrainContext(pydantic.BaseModel): 30 | """ 31 | A context object for training a model. This is passed to train(). 32 | """ 33 | end_time: datetime.datetime 34 | timedelta: datetime.timedelta 35 | session_id: str 36 | 37 | def get_toots(self) -> list[core.Toot]: 38 | return core.Toot.get_toots_since(self.end_time - self.timedelta) 39 | 40 | def sqlite_connection(self) -> sqlite3.Connection: 41 | return config.ConfigHandler.open_db() 42 | 43 | 44 | class BaseAlgorithm(abc.ABC): 45 | """ 46 | Base class for an algorithms that render your timeline. You should implemnet 47 | this class to create your own algorithm. 48 | 49 | Abstract methods: 50 | - render: Run the model 51 | - train: Train the model 52 | 53 | Additionally, you may want to override this method to provide a custom UI for 54 | your algorithm: 55 | 56 | - render_model_params 57 | 58 | Note that objects of this class must be serializable, via pickle. However, you 59 | can control how serialization works by overriding these methods: 60 | 61 | - serialize 62 | - deserialize 63 | """ 64 | 65 | @abc.abstractmethod 66 | def render(self, toots: list[core.Toot], context: "plugins.RenderContext") -> Renderable: 67 | """ 68 | Run the model and return a Renderable object. This object is typically 69 | deserialized before this method is called. 70 | 71 | :param toots: The toots to run the model on. This is typically 1 day of toots, 72 | or 6 hours, or whatever the user (you) has selected. 73 | 74 | :param context: A RenderContext object that you can use to render HTML. This 75 | is generally just passed to the Renderable object you return. 76 | """ 77 | raise NotImplementedError() 78 | 79 | @classmethod 80 | @abc.abstractmethod 81 | def train(cls, context: TrainContext, http_args: dict[str, str]) -> "BaseAlgorithm": 82 | """ 83 | Create an instance of this algorithm, and train it on the given toots. 84 | 85 | :param context: Context object where training data can be obtained. 86 | """ 87 | raise NotImplementedError() 88 | 89 | @classmethod 90 | def render_model_params(cls, context: "plugins.RenderContext") -> Response: 91 | """ 92 | Optionally, you can render HTML input elements that capture http_args passed 93 | to train(). This is useful if your agorithm has hyperparameters that you want 94 | to experiment with. 95 | """ 96 | return responses.HTMLResponse("") 97 | 98 | def serialize(self) -> bytes: 99 | return pickle.dumps(self) 100 | 101 | @staticmethod 102 | def deserialize(data: bytes) -> "BaseAlgorithm": 103 | return pickle.loads(data) -------------------------------------------------------------------------------- /fossil_mastodon/app/index.js: -------------------------------------------------------------------------------- 1 | import '@polymer/app-layout/app-layout.js'; 2 | -------------------------------------------------------------------------------- /fossil_mastodon/app/static/htmx.js: -------------------------------------------------------------------------------- 1 | Found. Redirecting to /htmx.org@1.9.10 -------------------------------------------------------------------------------- /fossil_mastodon/app/static/logo-light.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 7 | 8 | Created by potrace 1.16, written by Peter Selinger 2001-2019 9 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 | 26 | 28 | 30 | 32 | 34 | 36 | 38 | 40 | 42 | 44 | 46 | 48 | 50 | 52 | 56 | 58 | 60 | 62 | 64 | 66 | 68 | 70 | 74 | 76 | 79 | 81 | 83 | 85 | 87 | 89 | 91 | 93 | 97 | 99 | 101 | 103 | 105 | 107 | 109 | 113 | 117 | 121 | 124 | 127 | 131 | 134 | 136 | 139 | 142 | 144 | 146 | 147 | fossil 148 | 149 | -------------------------------------------------------------------------------- /fossil_mastodon/app/static/page.js: -------------------------------------------------------------------------------- 1 | 2 | const stickyElm = document.querySelector('.cluster .title') 3 | 4 | const observer = new IntersectionObserver( 5 | ([e]) => e.target.classList.toggle('isSticky', e.intersectionRatio < 1), 6 | {threshold: [1]} 7 | ); 8 | 9 | observer.observe(stickyElm) -------------------------------------------------------------------------------- /fossil_mastodon/app/static/style.css: -------------------------------------------------------------------------------- 1 | html { 2 | font-size: 100%; 3 | } 4 | 5 | body { 6 | background-color: #262222; 7 | color: #ddd; 8 | font-family: "Arial", sans-serif; 9 | max-width: 80rem; 10 | } 11 | 12 | a { 13 | color: #ddd; 14 | /* text-decoration: none; */ 15 | } 16 | 17 | .decl { 18 | background-color: #3f3544; 19 | border-radius: 0.5rem; 20 | border-width: 1px; 21 | margin-bottom: 1rem; 22 | padding: 0.5rem; 23 | } 24 | 25 | .model-param-label { 26 | padding-top: 2rem; 27 | padding-left: 2rem; 28 | } 29 | 30 | .row { 31 | display: flex; 32 | flex-direction: row; 33 | align-items: center; 34 | } 35 | .row * { 36 | margin: 0 0.5rem; 37 | } 38 | 39 | select { 40 | margin: 0.5rem; 41 | font-size: 1rem; 42 | padding: 0.5rem; 43 | padding-right: 1rem; 44 | border-radius: 4px; 45 | border: 1px solid #888888; 46 | background-color: #333333; 47 | color: #ffffff; 48 | } 49 | 50 | input[type="text"],input[type="password"] { 51 | margin: 0.5rem; 52 | font-size: 1rem; 53 | padding: 0.5rem; 54 | padding-right: 1rem; 55 | border-radius: 4px; 56 | border: 1px solid #888888; 57 | background-color: #333333; 58 | color: #ffffff; 59 | } 60 | 61 | h1.nav { 62 | display: flex; 63 | align-items: center; 64 | justify-content: space-between; /* This will keep the items spaced apart */ 65 | width: 100%; 66 | } 67 | 68 | .nav img { 69 | height: 3rem; 70 | width: 7rem; 71 | } 72 | 73 | /******** 74 | Navigation tabs 75 | *********/ 76 | .nav-tabs { 77 | margin-bottom: 2rem; 78 | border-bottom: 2px solid #555; 79 | } 80 | .nav-tab { 81 | font-size: 1.5rem; 82 | padding: 0.5rem; 83 | margin: 0rem 0.25rem 0rem 0.25rem; 84 | border-top-left-radius: 0.5rem; 85 | border-top-right-radius: 0.5rem; 86 | } 87 | .nav-tab:hover { 88 | background-color: #555; 89 | } 90 | 91 | .nav-tab a { 92 | text-decoration: none; 93 | } 94 | .nav-tabs div.active { 95 | background-color: #555555; 96 | } 97 | .nav-tabs .back-button:hover { 98 | border-top-left-radius: 0.5rem; 99 | border-top-right-radius: 0.5rem; 100 | background-color: #555; 101 | } 102 | 103 | .back-button { 104 | text-decoration: none; 105 | padding-right: 1rem; 106 | } 107 | 108 | .back-button::before { 109 | display: inline-block; 110 | width: 0.92rem; 111 | height: 1.8rem; 112 | vertical-align: -0.125rem; 113 | padding: 0.5rem 0.5rem 0.5rem 0rem; 114 | /* edit the fill=%23... to change color */ 115 | content: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 1472 1600'%3E%3Cpath fill='%23ccc' d='M1472 736v128q0 53-32.5 90.5T1355 992H651l293 294q38 36 38 90t-38 90l-75 76q-37 37-90 37q-52 0-91-37L37 890Q0 853 0 800q0-52 37-91L688 59q38-38 91-38q52 0 90 38l75 74q38 38 38 91t-38 91L651 608h704q52 0 84.5 37.5T1472 736'/%3E%3C/svg%3E"); 116 | } 117 | 118 | 119 | /******** 120 | Toot display 121 | *********/ 122 | .author img { 123 | width: 2rem; 124 | height: 2rem; 125 | border-radius: 50%; 126 | margin-right: 0.5rem; 127 | } 128 | 129 | .toot { 130 | background-color: #333333; 131 | padding: 1rem; 132 | border-radius: 4px; 133 | } 134 | 135 | .toot:not(:last-child) { 136 | margin-bottom: 1rem; 137 | } 138 | 139 | .toot .button-bar { 140 | display: flex; 141 | justify-content: flex-end; 142 | margin-top: 0.5rem; 143 | } 144 | 145 | .toot .button-bar button { 146 | background-color: transparent; 147 | margin-left: 0.5rem; 148 | } 149 | 150 | .toot .button-bar button:hover { 151 | background-color: #888888; 152 | margin-left: 0.5rem; 153 | } 154 | 155 | .toot .content img { 156 | max-width: 100%; 157 | /* margin-bottom: 10px; */ 158 | border-radius: 0.5rem; 159 | } 160 | 161 | .toot .content a { 162 | /* long links cause horizontal scroll */ 163 | word-wrap: break-word; 164 | } 165 | 166 | #toots { 167 | margin: 2.5rem 0.5rem; 168 | } 169 | 170 | /******** 171 | Cluster 172 | *********/ 173 | 174 | .cluster { 175 | border-width: 2px; 176 | border-radius: 4px; 177 | margin: 1rem 0rem; 178 | } 179 | 180 | .cluster .title { 181 | padding: 0.5rem; 182 | font-size: 1.5rem; 183 | font-weight: bold; 184 | background-color: #555555; 185 | cursor: pointer; 186 | position: sticky; 187 | top: 0; 188 | 189 | /* HACK: I'd rather only control the height when sticky, but that seems hard */ 190 | max-height: 5em; 191 | overflow: scroll; 192 | } 193 | 194 | .cluster[data-open="false"] .title { 195 | border-radius: 4px; 196 | } 197 | 198 | .cluster[data-open="true"] .title { 199 | border-bottom: 2px solid #888888; 200 | border-radius: 4px 0px; 201 | } 202 | 203 | .cluster[data-open="false"] .content { 204 | display: none; 205 | } 206 | 207 | .cluster .content button { 208 | display: flex; 209 | justify-content: flex-end; 210 | } 211 | 212 | /******** 213 | Button 214 | *********/ 215 | 216 | button { 217 | background-color: #666666; 218 | color: #ffffff; 219 | border: none; 220 | padding: 0.5rem 1rem; 221 | font-size: 1rem; 222 | font-weight: bold; 223 | border-radius: 4px; 224 | cursor: pointer; 225 | } 226 | 227 | button:hover { 228 | background-color: #888888; 229 | } 230 | 231 | button:focus { 232 | outline: none; 233 | box-shadow: 0 0 0 2px #ffffff; 234 | } 235 | 236 | /******** 237 | Radio 238 | *********/ 239 | 240 | .radio { 241 | margin: 0.6rem 0px 0.6rem 0px; 242 | } 243 | 244 | .radio div { 245 | display: inline-block; 246 | position: relative; 247 | padding-left: 0px; 248 | margin: 0px 0px 0px 0px; 249 | cursor: pointer; 250 | font-size: 16px; 251 | font-weight: bold; 252 | } 253 | 254 | .radio div input[type="radio"] { 255 | display: none; 256 | } 257 | 258 | .radio div label { 259 | display: inline-block; 260 | padding: 0.5rem 1rem; 261 | margin: 0px -4px 0px -4px; 262 | background-color: #666666; 263 | color: #ffffff; 264 | cursor: pointer; 265 | } 266 | 267 | .radio div input[type="radio"]:checked + label { 268 | background-color: #888888; 269 | } 270 | 271 | .radio div:first-child label { 272 | border-top-left-radius: 4px; 273 | border-bottom-left-radius: 4px; 274 | } 275 | 276 | .radio div:last-child label { 277 | border-top-right-radius: 4px; 278 | border-bottom-right-radius: 4px; 279 | } 280 | 281 | /******** 282 | Slider 283 | *********/ 284 | 285 | input[type="range"] { 286 | width: 100%; 287 | max-width: 400px; 288 | margin: 0.5rem 0px 0.5rem 0px; 289 | } 290 | 291 | /******** 292 | Spinner 293 | *********/ 294 | 295 | .spinner:not(.htmx-request) { 296 | display: inline-block; 297 | vertical-align: middle; 298 | width: 2rem; 299 | height: 2rem; 300 | border-radius: 100%; 301 | visibility: hidden; 302 | } 303 | 304 | .spinner.htmx-request { 305 | display: inline-block; 306 | vertical-align: middle; 307 | width: 2rem; 308 | height: 2rem; 309 | border-radius: 100%; 310 | visibility: visible; 311 | opacity: 1; 312 | } 313 | 314 | /******** 315 | Hamburger 316 | *********/ 317 | 318 | .hamburger-launch { 319 | background-color: transparent; 320 | } 321 | .hamburger-launch:hover { 322 | background-color: #444444; 323 | } 324 | .hamburger-launch svg { 325 | fill: #ffffff; 326 | } 327 | 328 | .hamburger { 329 | display: none; 330 | cursor: pointer; 331 | } 332 | .hamburger.open { 333 | display: flex; 334 | flex-direction: column; 335 | justify-content: flex-start; 336 | align-items: flex-end; 337 | right: 0; 338 | top: 5rem; 339 | min-width: 8rem; 340 | position: absolute; 341 | z-index: 10; 342 | background-color: #333333; 343 | padding: 1rem 0rem; 344 | border-radius: 4px; 345 | } 346 | .hamburger.open a { 347 | padding: 0.5rem 1rem; 348 | text-decoration: none; 349 | } 350 | .hamburger.open a:hover { 351 | background-color: #444444; 352 | } 353 | -------------------------------------------------------------------------------- /fossil_mastodon/app/static/work-in-progress.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tkellogg/fossil/89db2fdbea96666f101e6e16e287fa23fcee0b9d/fossil_mastodon/app/static/work-in-progress.gif -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/bad_plugin.html: -------------------------------------------------------------------------------- 1 | {% block content %} 2 |
3 |

Error in function signature!

4 | 10 |

11 | This can happen for a variety of reasons. Check the logs for more information. 12 |

13 |
14 | {% endblock %} 15 | -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/base/admin.html: -------------------------------------------------------------------------------- 1 | {% extends "base/page.html" %} 2 | 3 | {% block content %} 4 | 15 | 16 | {% block core_content %} 17 | 18 | {% endblock %} 19 | {% endblock %} 20 | 21 | -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/base/page.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | fossil 5 | 6 | 7 | 8 | 9 | 10 | 11 | {% for item in head_html() %} 12 | {% autoescape false %} 13 | {{ item }} 14 | {% endautoescape %} 15 | {% endfor %} 16 | 17 | 18 |

19 |
20 | 21 |
22 |
23 | {% for item in extra_nav() %} 24 | {% autoescape false %} 25 | {{ item.html }} 26 | {% endautoescape %} 27 | {% endfor %} 28 | 31 |
32 |

33 |
34 | Home 35 | Settings 36 | {% for menu_item in extra_menu_items() %} 37 | {% autoescape false %} 38 | {{ menu_item.html }} 39 | {% endautoescape %} 40 | {% endfor %} 41 |
42 |
43 | {% block content %} 44 | 45 | {% endblock %} 46 |
47 | 48 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/index.html: -------------------------------------------------------------------------------- 1 | {% extends "base/page.html" %} 2 | 3 | {% block content %} 4 | 5 |
6 | 7 | 8 | 9 |
10 | 11 |
12 |
13 | 14 | 15 |
16 |
17 | 18 | 19 |
20 |
21 | 22 | 23 |
24 |
25 | 26 |
27 |
28 | 29 | 30 |
31 |
32 | 33 | 34 |
35 |
36 | 37 | 38 |
39 |
40 | 41 | 42 |
43 |
44 | 45 |
46 | Model Settings 47 |
48 | 59 |
60 |
61 | {% autoescape false %} 62 | {{ model_params }} 63 | {% endautoescape %} 64 |
65 | 66 |
67 | 70 | 71 |
72 |
73 | 74 |
75 |
76 | 77 |
78 | 79 |
80 | {% endblock %} 81 | -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/no_algorithm.html: -------------------------------------------------------------------------------- 1 | {% extends "base/page.html" %} 2 | 3 | {% block content %} 4 |
5 |

No algorithms installed!

6 |

7 | This can happen for a variety of reasons. Check the logs for more information. 8 |

9 |
10 | {% endblock %} 11 | -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/settings.html: -------------------------------------------------------------------------------- 1 | {% extends "base/admin.html" %} 2 | 3 | {% block core_content %} 4 |
5 |
6 | 7 |
8 | 17 |
18 |
19 | 20 | 21 |
22 | 31 |
32 |
33 |
34 | 35 |
36 |
37 | Keys 38 | {% for key in keys %} 39 |
40 |
{{ key }}
41 | 42 |
43 |
44 | {% endfor %} 45 | 46 |
47 |
48 |
49 | {% endblock %} -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/toot.html: -------------------------------------------------------------------------------- 1 |
2 | 3 | {{ toot.author }} 4 | {{ toot.created_at | rel_date }} 5 | 6 |
7 | {% autoescape false %} 8 | {{ toot.content }} 9 | {% endautoescape %} 10 |
11 | {% for attachment in toot.media_attachments %} 12 | 13 | {% else %} 14 | {% if toot.card_url %} 15 | 16 | {% endif %} 17 | {% endfor %} 18 |
19 |
20 |
21 | 22 | 23 | 24 | {% autoescape false %} 25 | {{ ctx.render_toot_display_plugins(toot) }} 26 | {% endautoescape %} 27 |
28 |
-------------------------------------------------------------------------------- /fossil_mastodon/app/templates/toot_clusters.html: -------------------------------------------------------------------------------- 1 | 14 | 15 |
16 | {{ clusters.num_toots }} toots from {{ clusters.min_date | rel_date }} to {{ clusters.max_date | rel_date }} 17 |
18 | 19 | {% for cluster in clusters.clusters %} 20 |
21 |
22 | {{ cluster.name }} ({{ cluster.toots | length }} Toots) 23 |
24 |
25 | {% set toots = cluster.toots %} 26 | {% include 'toot_list.html' %} 27 |
28 |
29 | {% endfor %} -------------------------------------------------------------------------------- /fossil_mastodon/app/templates/toot_list.html: -------------------------------------------------------------------------------- 1 | {% for toot in toots %} 2 | {% include 'toot.html' %} 3 | {% endfor %} -------------------------------------------------------------------------------- /fossil_mastodon/config.py: -------------------------------------------------------------------------------- 1 | import atexit 2 | import json 3 | import os 4 | import pathlib 5 | import random 6 | import shutil 7 | import sqlite3 8 | import string 9 | from collections import defaultdict 10 | 11 | import llm 12 | import pydantic 13 | from dotenv import dotenv_values 14 | 15 | 16 | def get_config_var(var_name: str, default): 17 | return dotenv_values().get(var_name, os.environ.get(var_name, default)) 18 | 19 | class Model(pydantic.BaseModel): 20 | name: str 21 | context_length: int 22 | 23 | class _ConfigValueNotFound(): 24 | pass 25 | 26 | ConfigValueNotFound = _ConfigValueNotFound() 27 | 28 | class _ConfigHandler(): 29 | # Default fallbacks for variables defined in either .env or environment 30 | _config_var_defaults = { 31 | "DATABASE_PATH": "fossil.db", 32 | "OPENAI_KEY": "", 33 | "OPENAI_API_BASE": "https://api.openai.com/v1", 34 | "MASTO_BASE": "https://hachyderm.io", 35 | } 36 | 37 | _model_lengths = defaultdict( 38 | lambda: 2048, 39 | {"gpt-3.5-turbo": 4097, "ada-002": 8191} 40 | ) 41 | 42 | _model_cache = {} 43 | 44 | def __getattr__(self, item: str): 45 | c_val = get_config_var(item, self._config_var_defaults.get(item, ConfigValueNotFound)) 46 | 47 | if isinstance(c_val, _ConfigValueNotFound): 48 | raise AttributeError(f"{item} is not defined in either the enviroment or .env file") 49 | return c_val 50 | 51 | def _get_from_session(self, session_id: str| None, item: str) -> str: 52 | if not session_id: 53 | return "" 54 | with self.open_db() as conn: 55 | c = conn.cursor() 56 | c.execute('SELECT settings FROM sessions WHERE id = ?', [session_id]) 57 | row = c.fetchone() 58 | try: 59 | return json.loads(row[0]).get(item, "") 60 | except (json.decoder.JSONDecodeError, IndexError, TypeError): 61 | return "" 62 | 63 | def open_db(self) -> sqlite3.Connection: 64 | return sqlite3.connect(self.DATABASE_PATH) 65 | 66 | def EMBEDDING_MODEL(self, session_id: str|None = None) -> Model: 67 | c_val = self._get_from_session(session_id, "embedding_model") 68 | if not c_val: 69 | c_val = get_config_var("EMBEDDING_MODEL", "ada-002") 70 | 71 | if c_val not in self._model_cache: 72 | self._model_cache[c_val] = Model(name=c_val, context_length=self._model_lengths[c_val]) 73 | 74 | return self._model_cache[c_val] 75 | 76 | def SUMMARIZE_MODEL(self, session_id: str|None = None) -> Model: 77 | c_val = self._get_from_session(session_id, "summarize_model") 78 | if not c_val: 79 | c_val = get_config_var("SUMMARIZE_MODEL", "gpt-3.5-turbo") 80 | 81 | if c_val not in self._model_cache: 82 | self._model_cache[c_val] = Model(name=c_val, context_length=self._model_lengths[c_val]) 83 | 84 | return self._model_cache[c_val] 85 | 86 | 87 | ConfigHandler = _ConfigHandler() 88 | 89 | 90 | def headers(): 91 | return {"Authorization": f"Bearer {ConfigHandler.ACCESS_TOKEN}"} 92 | 93 | def get_installed_llms() -> set[str]: 94 | return {m.model.model_id for m in llm.get_models_with_aliases()} 95 | 96 | def get_installed_embedding_models() -> set[str]: 97 | return {m.model.model_id for m in llm.get_embedding_models_with_aliases()} 98 | 99 | 100 | # Static files 101 | class StaticFiles(pydantic.BaseModel): 102 | """ 103 | This manages static files so that the user can `pip install fossil-mastodon` and it runs 104 | fine. 105 | 106 | This copies all files into a temp directory and then deletes them as the program exits. This 107 | seems to work fine even in the dev workflow, since this module gets re-run every time 108 | uvicorn reloads the server. 109 | """ 110 | class Config: 111 | arbitrary_types_allowed = True 112 | base_path: pathlib.Path 113 | assets_path: pathlib.Path 114 | templates_path: pathlib.Path 115 | 116 | # HACK: Alright, I admit it, this is crazy. Here's the thing: we need to use shutil.rmtree in 117 | # the destructor, but the destructor runs at a very weird time. I observed it running after 118 | # the shutil module had been unloaded, so I was getting NullType for the module. Obvs the 119 | # simple solution is this — make the function live longer than this object by capturing a reference. 120 | rmtree = shutil.rmtree 121 | 122 | @classmethod 123 | def from_env(cls) -> "StaticFiles": 124 | src_path = pathlib.Path(__file__).parent / "app" 125 | # I used to use tempfile, but MacOS deletes temp files every 3 days, so I needed to move 126 | # to a more permanent location. 127 | dst_path = pathlib.Path(os.path.expanduser(f"~/.cache/fossil-mastodon/{''.join(random.choices(string.ascii_lowercase, k=10))}")) 128 | dst_path.mkdir(parents=True) 129 | shutil.copytree(src_path / "static", dst_path / "static") 130 | shutil.copytree(src_path / "templates", dst_path / "templates") 131 | 132 | obj = cls( 133 | base_path=dst_path, 134 | assets_path=dst_path / "static", 135 | templates_path=dst_path / "templates", 136 | ) 137 | 138 | atexit.register(obj.cleanup) 139 | 140 | return obj 141 | 142 | def add_dir(self, path: pathlib.Path, mount_path: str): 143 | shutil.copytree(path, self.base_path / mount_path, dirs_exist_ok=True) 144 | 145 | def cleanup(self): 146 | self.rmtree(self.assets_path.parent) 147 | 148 | def __del__(self): 149 | self.cleanup() 150 | 151 | 152 | ASSETS = StaticFiles.from_env() 153 | 154 | def get_db_path(conn: sqlite3.Connection) -> str: 155 | return conn.execute("PRAGMA database_list").fetchone()[2] -------------------------------------------------------------------------------- /fossil_mastodon/core.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import functools 3 | import importlib 4 | import json 5 | import logging 6 | import random 7 | import sqlite3 8 | import string 9 | import traceback 10 | import typing 11 | from typing import Optional, Type 12 | 13 | import html2text 14 | import llm 15 | import numpy as np 16 | from pydantic import BaseModel 17 | import requests 18 | import tiktoken 19 | 20 | from fossil_mastodon import config, migrations 21 | 22 | if typing.TYPE_CHECKING: 23 | from fossil_mastodon import algorithm 24 | 25 | 26 | logger = logging.getLogger(__name__) 27 | 28 | 29 | @functools.lru_cache() 30 | def _get_json(toot: "Toot") -> dict: 31 | # meh, this isn't great, but it works 32 | import json 33 | return json.loads(toot.orig_json) 34 | 35 | 36 | class MediaAttatchment(BaseModel): 37 | type: str | None 38 | preview_url: str | None 39 | url: str | None 40 | 41 | 42 | class Toot(BaseModel): 43 | class Config: 44 | arbitrary_types_allowed = True 45 | id: int | None = None 46 | content: str | None 47 | author: str | None 48 | url: str | None 49 | created_at: datetime.datetime 50 | embedding: np.ndarray | None = None 51 | orig_json: str | None = None 52 | cluster: str | None = None # Added cluster property 53 | 54 | @property 55 | def orig_dict(self) -> dict: 56 | return _get_json(self) 57 | 58 | @property 59 | def avatar_url(self) -> str | None: 60 | return self.orig_dict.get("account", {}).get("avatar") 61 | 62 | @property 63 | def profile_url(self) -> str | None: 64 | return self.orig_dict.get("account", {}).get("url") 65 | 66 | @property 67 | def display_name(self) -> str | None: 68 | return self.orig_dict.get("account", {}).get("display_name") 69 | 70 | @property 71 | def toot_id(self) -> str | None: 72 | return self.orig_dict.get("id") 73 | 74 | @property 75 | def is_reply(self) -> bool: 76 | return self.orig_dict.get("in_reply_to_id") is not None 77 | 78 | @property 79 | def media_attachments(self) -> list[MediaAttatchment]: 80 | return [MediaAttatchment(type=m.get("type"), url=m.get("url"), preview_url=m.get("preview_url")) 81 | for m in self.orig_dict.get("media_attachments", [])] 82 | 83 | @property 84 | def card_preview_url(self) -> str | None: 85 | return self.orig_dict.get("card", {}).get("image") 86 | 87 | @property 88 | def card_url(self) -> str | None: 89 | return self.orig_dict.get("card", {}).get("url") 90 | 91 | def __hash__(self): 92 | return hash(self.url) 93 | 94 | def __eq__(self, other): 95 | return self.url == other.url 96 | 97 | def save(self, init_conn: sqlite3.Connection | None = None) -> bool: 98 | try: 99 | if init_conn is None: 100 | conn = config.ConfigHandler.open_db() 101 | else: 102 | conn = init_conn 103 | migrations.create_database() 104 | c = conn.cursor() 105 | 106 | # Check if the URL already exists 107 | c.execute(''' 108 | SELECT COUNT(*) FROM toots WHERE url = ? and embedding is not null 109 | ''', (self.url,)) 110 | 111 | result = c.fetchone() 112 | url_exists = result[0] > 0 113 | 114 | if url_exists: 115 | # URL already exists, handle accordingly 116 | return False 117 | 118 | c.execute(''' 119 | DELETE FROM toots WHERE url = ? 120 | ''', (self.url,)) 121 | 122 | embedding = self.embedding.tobytes() if self.embedding is not None else bytes() 123 | c.execute(''' 124 | INSERT INTO toots (content, author, url, created_at, embedding, orig_json, cluster) 125 | VALUES (?, ?, ?, ?, ?, ?, ?) 126 | ''', (self.content, self.author, self.url, self.created_at, embedding, self.orig_json, self.cluster)) 127 | 128 | except: 129 | conn.rollback() 130 | raise 131 | finally: 132 | if init_conn is None: 133 | conn.commit() 134 | return True 135 | 136 | @classmethod 137 | def get_toots_since(cls, since: datetime.datetime) -> list["Toot"]: 138 | migrations.create_database() 139 | with config.ConfigHandler.open_db() as conn: 140 | c = conn.cursor() 141 | 142 | c.execute(''' 143 | SELECT 144 | id, content, author, url, created_at, embedding, orig_json, cluster 145 | FROM toots WHERE created_at >= ? 146 | ''', (since,)) 147 | 148 | rows = c.fetchall() 149 | toots = [] 150 | for row in rows: 151 | toot = cls( 152 | id=row[0], 153 | content=row[1], 154 | author=row[2], 155 | url=row[3], 156 | created_at=row[4], 157 | embedding=np.frombuffer(row[5]) if row[5] else None, 158 | orig_json=row[6], 159 | cluster=row[7] # Added cluster property 160 | ) 161 | toots.append(toot) 162 | 163 | return toots 164 | 165 | @classmethod 166 | def get_by_id(cls, id: int) -> Optional["Toot"]: 167 | migrations.create_database() 168 | with config.ConfigHandler.open_db() as conn: 169 | c = conn.cursor() 170 | 171 | c.execute(''' 172 | SELECT 173 | id, content, author, url, created_at, embedding, orig_json, cluster 174 | FROM toots WHERE id = ? 175 | ''', (id,)) 176 | 177 | row = c.fetchone() 178 | if row: 179 | toot = cls( 180 | id=row[0], 181 | content=row[1], 182 | author=row[2], 183 | url=row[3], 184 | created_at=row[4], 185 | embedding=np.frombuffer(row[5]) if row[5] else None, 186 | orig_json=row[6], 187 | cluster=row[7], # Added cluster property 188 | ) 189 | return toot 190 | return None 191 | 192 | @staticmethod 193 | def get_latest_date() -> datetime.datetime | None: 194 | migrations.create_database() 195 | with config.ConfigHandler.open_db() as conn: 196 | c = conn.cursor() 197 | 198 | c.execute(''' 199 | SELECT MAX(created_at) FROM toots 200 | -- fix issue where only part of the timeline is downloaded after an error 201 | WHERE embedding IS NOT NULL 202 | ''') 203 | 204 | result = c.fetchone() 205 | latest_date = result[0] if result[0] else None 206 | 207 | if isinstance(latest_date, str): 208 | try: 209 | latest_date = datetime.datetime.strptime(latest_date, "%Y-%m-%d %H:%M:%S.%f") 210 | except ValueError: 211 | latest_date = datetime.datetime.strptime(latest_date, "%Y-%m-%d %H:%M:%S") 212 | return latest_date 213 | 214 | @classmethod 215 | def from_dict(cls, data): 216 | import json 217 | 218 | if data.get("reblog"): 219 | return cls.from_dict(data["reblog"]) 220 | 221 | return cls( 222 | content=data.get("content"), 223 | author=data.get("account", {}).get("acct"), 224 | url=data.get("url"), 225 | created_at=datetime.datetime.strptime(data.get("created_at"), "%Y-%m-%dT%H:%M:%S.%fZ"), 226 | orig_json=json.dumps(data), 227 | ) 228 | 229 | def do_star(self): 230 | print("star", self.url) 231 | 232 | def do_boost(self): 233 | print("boost", self.url) 234 | 235 | 236 | def get_toots_since(since: datetime.datetime, session_id: str): 237 | assert isinstance(since, datetime.datetime), type(since) 238 | migrations.create_database() 239 | download_timeline(since, session_id) 240 | return Toot.get_toots_since(since) 241 | 242 | 243 | def download_timeline(since: datetime.datetime, session_id: str): 244 | last_date = Toot.get_latest_date() 245 | logger.info(f"last toot date: {last_date}") 246 | last_date = last_date or since 247 | earliest_date = None 248 | buffer: list[Toot] = [] 249 | last_id = "" 250 | curr_url = f"{config.ConfigHandler.MASTO_BASE}/api/v1/timelines/home?limit=40" 251 | while not earliest_date or earliest_date > last_date: 252 | response = requests.get(curr_url, headers=config.headers()) 253 | response.raise_for_status() 254 | json = response.json() 255 | if not json: 256 | logger.info("No more toots") 257 | break 258 | if len(json) > 1: 259 | last_id = json[-1]["id"] 260 | logger.info(f"Got {len(json)} toots; earliest={earliest_date.isoformat() if earliest_date else None}, last_id={last_id}") 261 | for toot_dict in json: 262 | toot = Toot.from_dict(toot_dict) 263 | earliest_date = toot.created_at if not earliest_date else min(earliest_date, datetime.datetime.strptime(toot_dict["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ")) 264 | buffer.append(toot) 265 | 266 | if "next" in response.links: 267 | curr_url = response.links["next"]["url"] 268 | else: 269 | break 270 | logger.info(f"done with toots; earliest={earliest_date.isoformat() if earliest_date else None}, last_date: {last_date.isoformat() if last_date else None}") 271 | 272 | page_size = 50 273 | if len(buffer) > 0: 274 | num_pages = len(buffer) // page_size + 1 275 | else: 276 | num_pages = 0 277 | for page in range(num_pages): 278 | start_index = page * page_size 279 | end_index = start_index + page_size 280 | page_toots = buffer[start_index:end_index] 281 | 282 | # Example: Call the _create_embeddings function 283 | _create_embeddings(page_toots, session_id) 284 | with config.ConfigHandler.open_db() as conn: 285 | for toot in page_toots: 286 | toot.save(init_conn=conn) 287 | 288 | def _prepare_text(text: str) -> str: 289 | return html2text.html2text(text)[:1000] 290 | 291 | def _create_embeddings(toots: list[Toot], session_id: str): 292 | # Convert the list of toots to a single string 293 | toots = [t for t in toots if t.content] 294 | 295 | # Call the llm embedding API to create embeddings 296 | # bugfix: The overall batch size seems to exceed the model's limit, so we need to split the batch into smaller chunks 297 | emb_model = llm.get_embedding_model(config.ConfigHandler.EMBEDDING_MODEL(session_id).name) 298 | total_size = 0 299 | batch = [] 300 | embeddings = [] 301 | measure = tiktoken.encoding_for_model("gpt-3.5-turbo") 302 | for toot in toots: 303 | text = _prepare_text(toot.content) 304 | new_tokens = len(measure.encode(text)) 305 | if total_size + new_tokens > 8000: 306 | embeddings.extend(emb_model.embed_batch(batch)) 307 | batch.clear() 308 | total_size = 0 309 | 310 | batch.append(text) 311 | total_size += new_tokens 312 | if len(batch) > 0: 313 | embeddings.extend(emb_model.embed_batch(batch)) 314 | batch.clear() 315 | 316 | # Extract the embeddings from the API response 317 | print(f"got {len(embeddings)} embeddings") 318 | for i, toot in enumerate(toots): 319 | toot.embedding = np.array(embeddings[i]) 320 | 321 | # Return the embeddings 322 | return toots 323 | 324 | 325 | class Settings(BaseModel): 326 | embedding_model: str | None = None 327 | summarize_model: str | None = None 328 | 329 | 330 | class Session(BaseModel): 331 | id: str 332 | algorithm_spec: str | None = None 333 | algorithm: bytes | None = None 334 | ui_settings: str | None = None 335 | settings: Settings 336 | name: str 337 | 338 | def set_ui_settings(self, ui_settings: dict[str, str]): 339 | self.ui_settings = json.dumps(ui_settings) 340 | self.save() 341 | 342 | def get_ui_settings(self) -> dict[str, str]: 343 | return json.loads(self.ui_settings or "{}") 344 | 345 | def get_algorithm_type(self) -> Type["algorithm.BaseAlgorithm"] | None: 346 | try: 347 | spec = json.loads(self.algorithm_spec) if self.algorithm_spec else {} 348 | if "module" in spec and "class_name" in spec: 349 | mod = importlib.import_module(spec["module"]) 350 | return getattr(mod, spec["class_name"]) 351 | return None 352 | except ModuleNotFoundError: 353 | traceback.print_exc() 354 | return None 355 | 356 | @classmethod 357 | def get_by_id(cls, id: str) -> Optional["Session"]: 358 | migrations.create_database() 359 | migrations.create_session_table() 360 | with config.ConfigHandler.open_db() as conn: 361 | print(f"Getting session; path={config.get_db_path(conn)}") 362 | c = conn.cursor() 363 | 364 | c.execute(''' 365 | SELECT id, algorithm_spec, algorithm, ui_settings, settings, name FROM sessions WHERE id = ? 366 | ''', (id,)) 367 | 368 | row = c.fetchone() 369 | if row: 370 | session = cls( 371 | id=row[0], 372 | algorithm_spec=row[1], 373 | algorithm=row[2], 374 | ui_settings=row[3], 375 | settings=Settings(**json.loads(row[4] or "{}")), 376 | name=row[5], 377 | ) 378 | return session 379 | return None 380 | 381 | @classmethod 382 | def get_or_create(cls, name: str = "Main") -> "Session": 383 | migrations.create_database() 384 | migrations.create_session_table() 385 | with config.ConfigHandler.open_db() as conn: 386 | c = conn.cursor() 387 | c.execute(""" SELECT id FROM sessions WHERE name IS NOT NULL ORDER BY name DESC LIMIT 1 """) 388 | row = c.fetchone() 389 | if row: 390 | # this is dumb. the ai did it. it's also brilliant. slightly innefficient, but whatever. clean. 391 | obj = cls.get_by_id(row[0]) 392 | assert obj is not None 393 | return obj 394 | else: 395 | rand_str = "".join(random.choices(string.ascii_lowercase) for _ in range(32)) 396 | obj = cls(id=rand_str, settings=Settings(), name=name) 397 | obj.save(init_conn=conn) 398 | return obj 399 | 400 | def save(self, init_conn: sqlite3.Connection | None = None) -> bool: 401 | try: 402 | if init_conn is None: 403 | conn = config.ConfigHandler.open_db() 404 | else: 405 | conn = init_conn 406 | migrations.create_database() 407 | migrations.create_session_table() 408 | c = conn.cursor() 409 | 410 | c.execute(''' 411 | INSERT INTO sessions (id, algorithm_spec, algorithm, ui_settings, settings, name) 412 | VALUES (?, ?, ?, ?, ?, ?) 413 | ON CONFLICT(id) DO UPDATE 414 | SET algorithm_spec = excluded.algorithm_spec 415 | , algorithm = excluded.algorithm 416 | , ui_settings = excluded.ui_settings 417 | , settings = excluded.settings 418 | , name = excluded.name 419 | ''', (self.id, self.algorithm_spec, self.algorithm, self.ui_settings, self.settings.model_dump_json(), self.name)) 420 | 421 | if init_conn is None: 422 | conn.commit() 423 | except: 424 | conn.rollback() 425 | raise 426 | return True 427 | -------------------------------------------------------------------------------- /fossil_mastodon/migrations.py: -------------------------------------------------------------------------------- 1 | """ 2 | Migration scripts to update the SQLite schema that are run at the last possible moment. 3 | There's no version numbers, so each script is responsible for "knowing" when it needs 4 | to run itself. 5 | 6 | Typically you should have a @lru_cache on each function to prevent unnecessary invocations, 7 | but also know that it'll get re-invoked every time the server restarts. 8 | """ 9 | import functools 10 | import random 11 | import sqlite3 12 | import string 13 | 14 | from fossil_mastodon import config 15 | 16 | class migration: 17 | """ 18 | Decorator that tracks all migration functions. 19 | """ 20 | all: list["migration"] = [] 21 | __counter = 0 22 | 23 | def __init__(self, func: callable): 24 | self.func = func 25 | self.cached = functools.lru_cache()(func) 26 | migration.__counter += 1 27 | self.id = migration.__counter 28 | migration.all.append(self) 29 | 30 | def __call__(self, *args, **kwargs): 31 | return self.cached(*args, **kwargs) 32 | 33 | 34 | @migration 35 | def create_database(): 36 | with config.ConfigHandler.open_db() as conn: 37 | c = conn.cursor() 38 | 39 | # Create the toots table if it doesn't exist 40 | c.execute(''' 41 | CREATE TABLE IF NOT EXISTS toots ( 42 | id INTEGER PRIMARY KEY AUTOINCREMENT, 43 | content TEXT, 44 | author TEXT, 45 | url TEXT, 46 | created_at DATETIME, 47 | embedding BLOB, 48 | orig_json TEXT, 49 | cluster TEXT -- Added cluster column 50 | ) 51 | ''') 52 | 53 | conn.commit() 54 | 55 | 56 | @migration 57 | def create_session_table(): 58 | create_database() 59 | with config.ConfigHandler.open_db() as conn: 60 | c = conn.cursor() 61 | 62 | # Create the toots table if it doesn't exist 63 | c.execute(''' 64 | CREATE TABLE IF NOT EXISTS sessions ( 65 | id TEXT PRIMARY KEY, 66 | algorithm_spec TEXT, 67 | algorithm BLOB, 68 | ui_settings TEXT 69 | ) 70 | ''') 71 | 72 | try: 73 | c.execute(''' 74 | ALTER TABLE sessions ADD COLUMN settings TEXT 75 | ''') 76 | except sqlite3.OperationalError: 77 | pass 78 | 79 | # Add session name 80 | try: 81 | c.execute(''' 82 | ALTER TABLE sessions ADD COLUMN name TEXT 83 | ''') 84 | except sqlite3.OperationalError: 85 | pass 86 | 87 | c.execute("DELETE FROM sessions WHERE name IS NULL") 88 | 89 | c2 = conn.cursor() 90 | c2.execute("SELECT COUNT(*) FROM sessions") 91 | row_count = c2.fetchone()[0] 92 | if row_count == 0: 93 | rand_str = "".join(random.choice(string.ascii_lowercase) for _ in range(32)) 94 | c2.execute(""" 95 | INSERT INTO sessions (id, name, settings) 96 | VALUES (?, ?, '{}') 97 | """, (rand_str, "Main")) 98 | 99 | conn.commit() -------------------------------------------------------------------------------- /fossil_mastodon/plugin_impl/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tkellogg/fossil/89db2fdbea96666f101e6e16e287fa23fcee0b9d/fossil_mastodon/plugin_impl/__init__.py -------------------------------------------------------------------------------- /fossil_mastodon/plugin_impl/toot_debug.py: -------------------------------------------------------------------------------- 1 | from fastapi import responses 2 | 3 | from fossil_mastodon import plugins, core 4 | 5 | 6 | plugin = plugins.Plugin( 7 | name="Toot Debug Button", 8 | description="Adds a button to toots that prints the toot's JSON to the server's console.", 9 | ) 10 | 11 | 12 | @plugin.api_operation.post("/plugins/toot_debug/{id}") 13 | async def toots_debug(id: int): 14 | toot = core.Toot.get_by_id(id) 15 | if toot is not None: 16 | import json 17 | print(json.dumps(toot.orig_dict, indent=2)) 18 | return responses.HTMLResponse("
💯
") 19 | 20 | 21 | @plugin.toot_display_button 22 | def get_response(toot: core.Toot, context: plugins.RenderContext) -> responses.Response: 23 | return responses.HTMLResponse(f""" 24 | 25 | """) -------------------------------------------------------------------------------- /fossil_mastodon/plugin_impl/topic_cluster.py: -------------------------------------------------------------------------------- 1 | import functools 2 | import random 3 | import string 4 | import llm 5 | import numpy as np 6 | import pydantic 7 | import tiktoken 8 | from fastapi import Response, responses 9 | from sklearn.cluster import KMeans 10 | from tqdm import trange 11 | 12 | from fossil_mastodon import algorithm, config, core, migrations, plugins, ui 13 | 14 | 15 | plugin = plugins.Plugin( 16 | name="Topic Cluster", 17 | description="Cluster toots by topic", 18 | ) 19 | 20 | 21 | class ClusterRenderer(algorithm.Renderable, pydantic.BaseModel): 22 | clusters: list[ui.TootCluster] 23 | context: plugins.RenderContext 24 | 25 | def render(self, **response_args) -> Response: 26 | toot_clusters = ui.TootClusters(clusters=self.clusters) 27 | return self.context.templates.TemplateResponse("toot_clusters.html", { 28 | "clusters": toot_clusters, 29 | **self.context.template_args(), 30 | }, 31 | **response_args) 32 | 33 | 34 | @migrations.migration 35 | def _create_table(): 36 | with config.ConfigHandler.open_db() as conn: 37 | c = conn.cursor() 38 | 39 | # Create the toots table if it doesn't exist 40 | c.execute(''' 41 | CREATE TABLE IF NOT EXISTS topic_cluster_toots ( 42 | id INTEGER PRIMARY KEY AUTOINCREMENT, 43 | toot_id INTEGER NOT NULL, 44 | model_version TEXT NOT NULL, 45 | cluster_id INTEGER NOT NULL, 46 | updated_at DATETIME DEFAULT CURRENT_TIMESTAMP 47 | ) 48 | ''') 49 | 50 | conn.commit() 51 | 52 | 53 | class TootModel(pydantic.BaseModel): 54 | """ 55 | Cache for the cluster id of a toot. The model_version is used to invalidate the cache if 56 | the model is retrained, since that would lead to an incompatible set of clusters. 57 | 58 | We can't store this inside the model because it's dynamic and created after the model is 59 | trained. 60 | """ 61 | id: int | None 62 | toot_id: int 63 | model_version: str 64 | cluster_id: int | None 65 | 66 | @classmethod 67 | def for_toots(cls, toots: list[core.Toot], model_version: str) -> list["TootModel"]: 68 | _create_table() 69 | with config.ConfigHandler.open_db() as conn: 70 | c = conn.cursor() 71 | c.execute(''' 72 | SELECT id, toot_id, model_version, cluster_id 73 | FROM topic_cluster_toots 74 | WHERE model_version = ? 75 | ''', (model_version, )) 76 | from_db = {row[1]: cls(id=row[0], toot_id=row[1], model_version=row[2], cluster_id=row[3]) for row in c.fetchall()} 77 | return [ 78 | from_db.get( 79 | toot.id, 80 | cls(id=None, toot_id=toot.id, model_version=model_version, cluster_id=None), 81 | ) 82 | for toot in toots 83 | ] 84 | 85 | def save(self): 86 | _create_table() 87 | if self.cluster_id is None: 88 | raise ValueError("Cannot save a toot model without a cluster_id") 89 | 90 | if isinstance(self.cluster_id, np.number): 91 | raise ValueError("cluster_id must be an int, not a numpy type") 92 | 93 | with config.ConfigHandler.open_db() as conn: 94 | c = conn.cursor() 95 | if self.id is None: 96 | c.execute(''' 97 | INSERT INTO topic_cluster_toots (toot_id, model_version, cluster_id) 98 | VALUES (?, ?, ?) 99 | ''', (self.toot_id, self.model_version, self.cluster_id)) 100 | self.id = c.lastrowid 101 | else: 102 | c.execute(''' 103 | UPDATE topic_cluster_toots 104 | SET cluster_id = ?, updated_at = CURRENT_TIMESTAMP 105 | WHERE id = ? 106 | ''', (self.cluster_id, self.id)) 107 | conn.commit() 108 | 109 | 110 | @plugin.algorithm 111 | class TopicCluster(algorithm.BaseAlgorithm): 112 | def __init__(self, kmeans: KMeans, labels: dict[int, str], model_version: str | None = None): 113 | self.kmeans = kmeans 114 | self.labels = labels 115 | self.model_version = model_version 116 | 117 | def render(self, toots: list[core.Toot], context: plugins.RenderContext) -> ClusterRenderer: 118 | before = len(toots) 119 | toots = [toot for toot in toots if toot.embedding is not None] 120 | toot_models = TootModel.for_toots(toots, model_version=self.model_version) 121 | print("Removed", before - len(toots), "toots with no embedding (probably image-only).", f"{len(toots)} toots remaining.") 122 | 123 | # Assign clusters to the uncached toots 124 | unassigned = [toot for toot, toot_model in zip(toots, toot_models) if toot_model.cluster_id is None] 125 | if len(unassigned) > 0: 126 | unassigned_models = [toot_model for toot_model in toot_models if toot_model.cluster_id is None] 127 | cluster_indices = self.kmeans.predict(np.array([toot.embedding for toot in unassigned])) 128 | print(f"Assigning clusters for {len(unassigned)} toots; model_version={self.model_version}") 129 | for toot, cluster_index, toot_model in zip(unassigned, cluster_indices, unassigned_models): 130 | toot.cluster = self.labels[cluster_index] 131 | toot_model.cluster_id = int(cluster_index) 132 | toot_model.save() 133 | 134 | toot_clusters = ui.TootClusters( 135 | clusters=[ 136 | ui.TootCluster( 137 | id=i_cluster, 138 | name=cluster_label, 139 | toots=[toot for toot, toot_model in zip(toots, toot_models) if toot_model.cluster_id == i_cluster], 140 | ) 141 | for i_cluster, cluster_label in self.labels.items() 142 | ] 143 | ) 144 | return ClusterRenderer(clusters=toot_clusters.clusters, context=context) 145 | 146 | @classmethod 147 | def train(cls, context: algorithm.TrainContext, args: dict[str, str]) -> "TopicCluster": 148 | toots = [toot for toot in context.get_toots() if toot.embedding is not None] 149 | 150 | n_clusters = int(args["num_clusters"]) 151 | if len(toots) < n_clusters: 152 | return cls(kmeans=NoopKMeans(n_clusters=1), labels={0: "All toots"}) 153 | 154 | embeddings = np.array([toot.embedding for toot in toots]) 155 | kmeans = KMeans(n_clusters=n_clusters) 156 | cluster_labels = kmeans.fit_predict(embeddings) 157 | 158 | labels: dict[int, str] = {} 159 | model = llm.get_model(config.ConfigHandler.SUMMARIZE_MODEL(context.session_id).name) 160 | for i_clusters in trange(n_clusters): 161 | clustered_toots = [toot for toot, cluster_label in zip(toots, cluster_labels) if cluster_label == i_clusters] 162 | combined_text = "\n\n".join([toot.content for toot in clustered_toots]) 163 | 164 | # Use the summarizing model to summarize the combined text 165 | prompt = f"Create a single label that describes all of these related tweets, make it succinct but descriptive. The label should describe all {len(clustered_toots)} of these\n\n{combined_text}" 166 | summary = model.prompt(reduce_size(context.session_id, prompt)).text().strip() 167 | labels[int(i_clusters)] = summary 168 | 169 | model_version = "".join(random.choice(string.ascii_lowercase) for _ in range(12)) 170 | return cls(kmeans=kmeans, labels=labels, model_version=model_version) 171 | 172 | @staticmethod 173 | def render_model_params(context: plugins.RenderContext) -> Response: 174 | default = context.session.get_ui_settings().get("num_clusters", "15") 175 | return responses.HTMLResponse(f""" 176 |
177 | 178 | {default} clusters 179 |
180 | """) 181 | 182 | def get_encoding(session_id: str): 183 | try: 184 | return tiktoken.encoding_for_model(config.ConfigHandler.SUMMARIZE_MODEL(session_id).name) 185 | except KeyError: 186 | encoding_name = tiktoken.list_encoding_names()[-1] 187 | return tiktoken.get_encoding(encoding_name) 188 | 189 | def reduce_size(session_id: str, text: str, model_limit: int = -1, est_output_size: int = 500) -> str: 190 | if model_limit < 0: 191 | model_limit = config.ConfigHandler.SUMMARIZE_MODEL(session_id).context_length 192 | tokens = get_encoding(session_id).encode(text) 193 | return get_encoding(session_id).decode(tokens[:model_limit - est_output_size]) 194 | 195 | 196 | class NoopKMeans(KMeans): 197 | def predict(self, X, y=None, sample_weight=None): 198 | return np.zeros(len(X), dtype=int) 199 | -------------------------------------------------------------------------------- /fossil_mastodon/plugins.py: -------------------------------------------------------------------------------- 1 | import abc 2 | import contextlib 3 | import functools 4 | import inspect 5 | import logging 6 | import pathlib 7 | import re 8 | import sys 9 | import traceback 10 | from typing import Callable, Type, TYPE_CHECKING 11 | 12 | from fastapi import FastAPI, Request, responses, templating 13 | import pkg_resources 14 | import pydantic 15 | 16 | from fossil_mastodon import algorithm, config, ui, core 17 | 18 | if TYPE_CHECKING: 19 | from fossil_mastodon import server 20 | 21 | 22 | logger = logging.getLogger(__name__) 23 | 24 | 25 | def title_case_to_spaced(string): 26 | # The regex pattern looks for any lowercase letter followed by an uppercase letter 27 | # and inserts a space between them 28 | return re.sub(r'(?<=[a-z])(?=[A-Z])', ' ', string) 29 | 30 | 31 | TootDisplayFn = Callable[[core.Toot, "RenderContext"], responses.Response] 32 | class TootDisplayPlugin(pydantic.BaseModel): 33 | fn: TootDisplayFn 34 | fn_name: str 35 | 36 | def render_str(self, toot: core.Toot, context: "RenderContext") -> str: 37 | obj = self.fn(toot, context) 38 | content = obj.body.decode("utf-8") 39 | return content 40 | 41 | 42 | class RenderContext(pydantic.BaseModel): 43 | """ 44 | A context object for rendering a template. 45 | """ 46 | class Config: 47 | arbitrary_types_allowed = True 48 | templates: templating.Jinja2Templates 49 | request: Request 50 | link_style: ui.LinkStyle 51 | session: core.Session 52 | 53 | def template_args(self) -> dict: 54 | return { 55 | "request": self.request, 56 | "link_style": self.link_style, 57 | "ctx": self, 58 | } 59 | 60 | def render_toot_display_plugins(self, toot: core.Toot) -> str: 61 | return "".join( 62 | plugin.render_str(toot, self) 63 | for plugin in get_toot_display_plugins() 64 | ) 65 | 66 | 67 | _app: FastAPI | None = None 68 | 69 | class _MenuItem(pydantic.BaseModel): 70 | html: str 71 | url: str 72 | 73 | 74 | class Plugin(pydantic.BaseModel): 75 | """ 76 | Plugin registration API 77 | 78 | Example: 79 | 80 | plugin = Plugin(name="My Plugin", description="Add button to toot that triggers an API POST operation") 81 | 82 | @plugin.api_operation.post("/my_plugin") 83 | def my_plugin(request: Request): 84 | return responses.HTMLResponse("
💯
") 85 | 86 | @plugin.toot_display_button 87 | def my_toot_display(toot: core.Toot, context: RenderContext): 88 | return responses.HTMLResponse("
💯
") 89 | 90 | """ 91 | name: str 92 | display_name: str | None = None 93 | description: str | None = None 94 | author: str | None = None 95 | author_url: str | None = None 96 | enabled_by_default: bool = True 97 | _toot_display_buttons: list[TootDisplayPlugin] = pydantic.PrivateAttr(default_factory=list) 98 | _algorithms: list[Type[algorithm.BaseAlgorithm]] = pydantic.PrivateAttr(default_factory=list) 99 | _lifecycle_hooks: list[callable] = pydantic.PrivateAttr(default_factory=list) 100 | _menu_items: list[_MenuItem] = pydantic.PrivateAttr(default_factory=list) 101 | _extra_nav: list[str] = pydantic.PrivateAttr(default_factory=list) 102 | _head_html: list[str] = pydantic.PrivateAttr(default_factory=list) 103 | 104 | @pydantic.validator("display_name", always=True) 105 | def _set_display_name(cls, v, values): 106 | return v or values["name"] 107 | 108 | @property 109 | def api_operation(self) -> FastAPI: 110 | assert _app is not None 111 | return _app 112 | 113 | @property 114 | def TemplateResponse(self) -> Type["server.templates.TemplateResponse"]: 115 | from fossil_mastodon import server 116 | return server.templates.TemplateResponse 117 | 118 | def toot_display_button(self, impl: TootDisplayFn) -> TootDisplayFn: 119 | """ 120 | Decorator for adding a button to the toot display UI. This function should return a 121 | fastapi.responses.Response object. The result will be extracted and inserted into the 122 | toot display UI. 123 | """ 124 | name = impl.__name__ 125 | 126 | @functools.wraps(impl) 127 | def wrapper(toot: core.Toot, context: RenderContext): 128 | try: 129 | return impl(toot, context) 130 | except TypeError as e: 131 | raise BadPluginFunction(self, impl, "example_function(toot: fossil_mastodon.core.Toot, context: fossil_mastodon.plugins.RenderContext)") from e 132 | except Exception as e: 133 | import inspect 134 | print(inspect.signature(impl)) 135 | raise RuntimeError(f"Error in toot display plugin '{self.name}', function '{name}'") from e 136 | 137 | self._toot_display_buttons.append(TootDisplayPlugin(fn=wrapper, fn_name=name)) 138 | return wrapper 139 | 140 | def algorithm(self, algo: Type[algorithm.BaseAlgorithm]) -> Type[algorithm.BaseAlgorithm]: 141 | """ 142 | Decorator for adding an algorithm class. 143 | """ 144 | if not issubclass(algo, algorithm.BaseAlgorithm): 145 | raise ValueError(f"Algorithm {algo} is not a subclass of algorithm.BaseAlgorithm") 146 | self._algorithms.append(algo) 147 | algo.plugin = self 148 | return algo 149 | 150 | def lifecycle_hook(self, fn: callable) -> callable: 151 | """ 152 | Decorator for adding a lifecycle hook. Lifecycle hooks are called when the server starts 153 | up, and can be used to perform initialization tasks. 154 | """ 155 | self._lifecycle_hooks.append(fn) 156 | return fn 157 | 158 | def add_templates_dir(self, path: pathlib.Path): 159 | """ 160 | Add a directory of templates to the plugin. These will be accessible from FastAPI response 161 | objects. For example, if you add a directory of templates at `/templates`, then you 162 | can return a template from a FastAPI route like this: 163 | 164 | @plugin.api_operation.get("/my_route") 165 | def my_route(): 166 | return plugin.TemplateResponse("my_template.html", {"request": request}) 167 | """ 168 | config.ASSETS.add_dir(path, "templates") 169 | 170 | def add_static_dir(self, path: pathlib.Path): 171 | """ 172 | Add a directory of static files to the plugin. These will be downloadable by the browser at 173 | the path `GET /static/example.css`, assuming the example.css exists at `/example.css` 174 | as a local path. 175 | """ 176 | config.ASSETS.add_dir(path, "static") 177 | 178 | def add_menu_item(self, raw_html: str, url="#"): 179 | self._menu_items.append(_MenuItem(html=raw_html, url=url)) 180 | 181 | def add_extra_nav(self, raw_html: str): 182 | self._extra_nav.append(raw_html) 183 | 184 | def add_head_html(self, raw_html: str): 185 | self._head_html.append(raw_html) 186 | 187 | 188 | def init_plugins(app: FastAPI): 189 | global _app 190 | _app = app 191 | get_plugins() 192 | 193 | 194 | @functools.lru_cache 195 | def get_plugins() -> list[Plugin]: 196 | if _app is None: 197 | raise RuntimeError("Plugins not initialized") 198 | 199 | plugins = [] 200 | for entry_point in pkg_resources.iter_entry_points("fossil_mastodon.plugins"): 201 | print("Loading plugin", entry_point.name) 202 | try: 203 | plugin = entry_point.load() 204 | if isinstance(plugin, Plugin): 205 | plugins.append(plugin) 206 | else: 207 | print(f"Error loading toot display plugin '{entry_point.name}': not a subclass of Plugin") 208 | except: 209 | print(f"Error loading toot display plugin {entry_point.name}") 210 | traceback.print_exc() 211 | return plugins 212 | 213 | 214 | def get_toot_display_plugins() -> list[TootDisplayPlugin]: 215 | return [ 216 | b 217 | for p in get_plugins() 218 | for b in p._toot_display_buttons 219 | ] 220 | 221 | 222 | def get_algorithms() -> list[Type[algorithm.BaseAlgorithm]]: 223 | return [ 224 | algo 225 | for p in get_plugins() 226 | for algo in p._algorithms 227 | ] 228 | 229 | 230 | def get_menu_items() -> list[str]: 231 | return [ 232 | algo 233 | for p in get_plugins() 234 | for algo in p._menu_items 235 | ] 236 | 237 | 238 | def get_extra_nav() -> list[str]: 239 | return [ 240 | algo 241 | for p in get_plugins() 242 | for algo in p._extra_nav 243 | ] 244 | 245 | 246 | def get_head_html() -> list[str]: 247 | return [ 248 | algo 249 | for p in get_plugins() 250 | for algo in p._head_html 251 | ] 252 | 253 | 254 | def get_lifecycle_hooks() -> list[callable]: 255 | return [ 256 | contextlib.contextmanager(hook) 257 | for p in get_plugins() 258 | for hook in p._lifecycle_hooks 259 | ] 260 | 261 | @contextlib.asynccontextmanager 262 | async def lifespan(app: FastAPI): 263 | hooks = get_lifecycle_hooks() 264 | 265 | objects = [] 266 | for hook in hooks: 267 | try: 268 | obj = hook(app) 269 | obj.__enter__() 270 | objects.append(obj) 271 | except: 272 | logger.exception(f"Error running lifecycle hook {hook}") 273 | 274 | yield 275 | 276 | exc_info = sys.exc_info() 277 | exc = exc_info[1] if exc_info else None 278 | exc_type = exc_info[0] if exc_info else None 279 | tb = exc_info[2] if exc_info else None 280 | for obj in objects: 281 | try: 282 | obj.__exit__(exc_type, exc, tb) 283 | except: 284 | logger.exception(f"Error running lifecycle hook {hook}") 285 | 286 | 287 | class BadPluginFunction(Exception): 288 | def __init__(self, plugin: Plugin, function: callable, expected_signature: str): 289 | super().__init__(f"Bad function call: {plugin.name}.{function.__name__} should have signature {expected_signature}") 290 | self.plugin = plugin 291 | self.function = function 292 | self.signature = inspect.signature(function) 293 | self.expected_signature = expected_signature 294 | self.function_name = function.__name__ 295 | -------------------------------------------------------------------------------- /fossil_mastodon/science.py: -------------------------------------------------------------------------------- 1 | import llm 2 | import numpy as np 3 | import openai 4 | import tiktoken 5 | from sklearn.cluster import KMeans 6 | 7 | from . import config, core 8 | 9 | 10 | def assign_clusters(session_id: str, toots: list[core.Toot], n_clusters: int = 5): 11 | # meh, ignore toots without content. I think this might be just an image, not sure 12 | toots = [toot for toot in toots if toot.embedding is not None] 13 | 14 | # Perform k-means clustering on the embeddings 15 | embeddings = np.array([toot.embedding for toot in toots]) 16 | kmeans = KMeans(n_clusters=n_clusters) 17 | cluster_labels = kmeans.fit_predict(embeddings) 18 | 19 | client = openai.OpenAI(api_key=config.ConfigHandler.OPENAI_KEY) 20 | for i_clusters in range(n_clusters): 21 | clustered_toots = [toot for toot, cluster_label in zip(toots, cluster_labels) if cluster_label == i_clusters] 22 | combined_text = "\n\n".join([toot.content for toot in clustered_toots]) 23 | 24 | # Use GPT-3.5-turbo to summarize the combined text 25 | prompt = f"Create a single label that describes all of these related tweets, make it succinct but descriptive. The label should describe all {len(clustered_toots)} of these\n\n{combined_text}" 26 | model = llm.get_model(config.ConfigHandler.SUMMARIZE_MODEL(session_id).name) 27 | summary = model.prompt(prompt).text() 28 | 29 | # Do something with the summary 30 | for toot, cluster_label in zip(toots, cluster_labels): 31 | if cluster_label == i_clusters: 32 | toot.cluster = summary 33 | 34 | def get_encoding(session_id: str): 35 | try: 36 | return tiktoken.encoding_for_model(config.ConfigHandler.SUMMARIZE_MODEL(session_id).name) 37 | except KeyError: 38 | encoding_name = tiktoken.list_encoding_names()[-1] 39 | return tiktoken.get_encoding(encoding_name) 40 | 41 | def reduce_size(session_id: str, text: str, model_limit: int = -1, est_output_size: int = 500) -> str: 42 | if model_limit < 0: 43 | config.ConfigHandler.SUMMARIZE_MODEL(session_id).context_length 44 | tokens = get_encoding(session_id).encode(text) 45 | return get_encoding(session_id).decode(tokens[:model_limit - est_output_size]) 46 | -------------------------------------------------------------------------------- /fossil_mastodon/server.py: -------------------------------------------------------------------------------- 1 | """ 2 | A FastAPI HTML server. 3 | 4 | The streamlit version had issues around state management and was genrally slow 5 | and inflexible. This gives us a lot more control. 6 | """ 7 | import datetime 8 | import importlib 9 | import json 10 | import logging 11 | import random 12 | import string 13 | from typing import Annotated, Type 14 | 15 | import llm 16 | import requests 17 | from fastapi import FastAPI, Form, HTTPException, Request, responses, staticfiles, templating 18 | 19 | from fossil_mastodon import algorithm, config, core, migrations, plugins, ui 20 | 21 | 22 | logger = logging.getLogger(__name__) 23 | 24 | 25 | app = FastAPI(lifespan=plugins.lifespan) 26 | 27 | 28 | app.mount("/static", staticfiles.StaticFiles(directory=config.ASSETS.assets_path), name="static") 29 | templates = templating.Jinja2Templates(directory=config.ASSETS.templates_path) 30 | print("using template directory", config.ASSETS.templates_path) 31 | templates.env.filters["rel_date"] = ui.time_ago 32 | 33 | 34 | @app.middleware("http") 35 | async def session_middleware(request: Request, call_next): 36 | """ 37 | Called before each request. Sets up the session and saves it to the database. 38 | """ 39 | session_id = request.cookies.get("fossil_session_id") 40 | session = core.Session.get_by_id(session_id) if session_id else None 41 | if session is None: 42 | session = core.Session.get_or_create() 43 | session.save() 44 | request.state.session = session 45 | response = await call_next(request) 46 | response.set_cookie("fossil_session_id", session.id) 47 | return response 48 | else: 49 | request.state.session = session 50 | return await call_next(request) 51 | 52 | 53 | @app.get("/") 54 | async def root(request: Request): 55 | session: core.Session = request.state.session 56 | ctx = plugins.RenderContext( 57 | templates=templates, 58 | request=request, 59 | link_style=ui.LinkStyle("Desktop"), 60 | session=session, 61 | ) 62 | 63 | # GUARD: ensure some algorithms are installed 64 | algo_list = plugins.get_algorithms() 65 | if len(algo_list) == 0: 66 | print(f"No algorithms found (num plugins: {len(plugins.get_plugins())})") 67 | for plugin in plugins.get_plugins(): 68 | print(f"Plugin ({plugin.name})", plugin) 69 | return templates.TemplateResponse("no_algorithm.html", { 70 | "request": request, 71 | }) 72 | 73 | # Render the UI 74 | algo = session.get_algorithm_type() or algo_list[0] 75 | return templates.TemplateResponse("index.html", { 76 | "request": request, 77 | "model_params": algo.render_model_params(ctx).body.decode("utf-8"), 78 | "ui_settings": session.get_ui_settings(), 79 | "selected_algorithm": algo, 80 | "algorithms": [ 81 | {"name": algo.plugin.name, "display_name": algo.plugin.display_name} 82 | for algo in plugins.get_algorithms() 83 | ], 84 | }) 85 | 86 | 87 | @app.get("/toots") 88 | async def toots(): 89 | return staticfiles.FileResponse("public/toots.html") 90 | 91 | 92 | @app.post("/toots/download") 93 | async def toots_download(request: Request): 94 | # init 95 | migrations.create_database() 96 | session: core.Session = request.state.session 97 | algorithm_spec: dict = json.loads(session.algorithm_spec) if session.algorithm_spec else {} 98 | 99 | # first page load calls this with display-only=true to load what was loaded last time 100 | if request.query_params.get("display-only", "") != "true": 101 | # download 102 | core.download_timeline(datetime.datetime.utcnow() - datetime.timedelta(days=1), session.id) 103 | 104 | # render 105 | body_params: dict[str, str] = dict((await request.form())) 106 | session.set_ui_settings(body_params) 107 | print("algorithm_spec", algorithm_spec) 108 | if "module" in algorithm_spec and "class_name" in algorithm_spec: 109 | mod = importlib.import_module(algorithm_spec["module"]) 110 | model_class: Type[algorithm.BaseAlgorithm] = getattr(mod, algorithm_spec["class_name"]) 111 | model: algorithm.BaseAlgorithm = model_class.deserialize(session.algorithm) 112 | timespan = ui.timedelta(body_params["time_span"]) 113 | timeline = core.Toot.get_toots_since(datetime.datetime.utcnow() - timespan) 114 | renderable = model.render(timeline, plugins.RenderContext( 115 | templates=templates, 116 | request=request, 117 | link_style=ui.LinkStyle(body_params["link_style"] if "link_style" in body_params else "Desktop"), 118 | session=session, 119 | )) 120 | return renderable.render() 121 | else: 122 | return responses.HTMLResponse("
No Toots 😥
") 123 | 124 | 125 | @app.post("/toots/train") 126 | async def toots_train( 127 | link_style: Annotated[str, Form()], 128 | time_span: Annotated[str, Form()], 129 | request: Request, 130 | ): 131 | context = algorithm.TrainContext( 132 | end_time=datetime.datetime.utcnow(), 133 | timedelta=ui.timedelta(time_span), 134 | session_id=request.state.session.id 135 | ) 136 | 137 | algo_kwargs = {k: v for k, v in dict((await request.form())).items() 138 | if k not in {"link_style", "time_span"}} 139 | print("Algorithm kwargs:", algo_kwargs) 140 | 141 | # train 142 | session: core.Session = request.state.session 143 | algo = session.get_algorithm_type() or plugins.get_algorithms()[0] 144 | algo.model_version = "".join(random.choices(string.ascii_letters + string.digits, k=12)) 145 | model = algo.train(context, algo_kwargs) 146 | session.algorithm = model.serialize() 147 | session.algorithm_spec = json.dumps({ 148 | "module": model.__class__.__module__, 149 | "class_name": model.__class__.__qualname__, 150 | "kwargs": algo_kwargs, 151 | }) 152 | session.save() 153 | 154 | # render 155 | timeline = core.Toot.get_toots_since(datetime.datetime.utcnow() - ui.timedelta(time_span)) 156 | renderable = model.render(timeline, plugins.RenderContext( 157 | templates=templates, 158 | request=request, 159 | link_style=ui.LinkStyle(link_style), 160 | session=session, 161 | )) 162 | try: 163 | return renderable.render() 164 | except plugins.BadPluginFunction as ex: 165 | return templates.TemplateResponse("bad_plugin.html", { "request": request, "ex": ex }) 166 | 167 | 168 | @app.get("/algorithm/{name}/form") 169 | async def algorithm_form(name: str, request: Request): 170 | session: core.Session = request.state.session 171 | algo_type = session.get_algorithm_type() or plugins.get_algorithms()[0] 172 | ctx = plugins.RenderContext( 173 | templates=templates, 174 | request=request, 175 | link_style=ui.LinkStyle(session.get_ui_settings().get("link_style", "Desktop")), 176 | session=session, 177 | ) 178 | return algo_type.render_model_params(ctx) 179 | 180 | 181 | @app.get("/settings") 182 | async def get_settings(request: Request): 183 | session: core.Session = request.state.session 184 | keys = {"openai": "", **llm.load_keys()} 185 | return templates.TemplateResponse("settings.html", { 186 | "request": request, 187 | "settings": session.settings, 188 | "embedding_models": config.get_installed_embedding_models(), 189 | "embedding_model": session.settings.embedding_model, 190 | "summarize_models": config.get_installed_llms(), 191 | "summarize_model": session.settings.summarize_model, 192 | "keys": keys, 193 | }) 194 | 195 | @app.post("/settings") 196 | async def post_settings(settings: core.Settings, request: Request): 197 | session: core.Session = request.state.session 198 | session.settings = settings 199 | session.save() 200 | return responses.HTMLResponse("
👍
") 201 | 202 | @app.post("/keys") 203 | async def post_keys(request: Request): 204 | body_params: dict[str, str] = dict((await request.form())) 205 | key_path = llm.user_dir() / "keys.json" 206 | key_path.write_text(json.dumps(body_params)) 207 | return responses.HTMLResponse("
👍
") 208 | 209 | 210 | @app.post("/toots/{id}/debug") 211 | async def toots_debug(id: int): 212 | toot = core.Toot.get_by_id(id) 213 | if toot is not None: 214 | import json 215 | print(json.dumps(toot.orig_dict, indent=2)) 216 | return responses.HTMLResponse("
💯
") 217 | 218 | @app.post("/toots/{id}/boost") 219 | async def toots_boost(id: int): 220 | toot = core.Toot.get_by_id(id) 221 | if toot is not None: 222 | url = f'{config.ConfigHandler.MASTO_BASE}/api/v1/statuses/{toot.toot_id}/reblog' 223 | data = { 224 | 'visibility': 'public' 225 | } 226 | response = requests.post(url, json=data, headers=config.headers()) 227 | try: 228 | response.raise_for_status() 229 | return responses.HTMLResponse("
🚀
") 230 | except: 231 | print("ERROR:", response.json()) 232 | raise 233 | raise HTTPException(status_code=404, detail="Toot not found") 234 | 235 | @app.post("/toots/{id}/favorite") 236 | async def toots_favorite(id: int): 237 | toot = core.Toot.get_by_id(id) 238 | if toot is not None: 239 | url = f'{config.ConfigHandler.MASTO_BASE}/api/v1/statuses/{toot.toot_id}/favourite' 240 | response = requests.post(url, headers=config.headers()) 241 | try: 242 | response.raise_for_status() 243 | return responses.HTMLResponse("
💫
") 244 | except: 245 | print("ERROR:", response.json()) 246 | raise 247 | raise HTTPException(status_code=404, detail="Toot not found") 248 | 249 | templates.env.globals["extra_menu_items"] = plugins.get_menu_items 250 | templates.env.globals["head_html"] = plugins.get_head_html 251 | templates.env.globals["extra_nav"] = plugins.get_extra_nav 252 | 253 | # this should always be the last line of this file 254 | plugins.init_plugins(app) -------------------------------------------------------------------------------- /fossil_mastodon/ui.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import re 3 | import urllib.parse 4 | 5 | import pydantic 6 | import streamlit as st 7 | 8 | from . import config, core 9 | 10 | 11 | def get_time_frame() -> datetime.timedelta: 12 | time_frame = st.radio("Show last:", ["6 hours", "day", "week"], horizontal=True) 13 | 14 | if time_frame == "6 hours": 15 | return datetime.timedelta(hours=6) 16 | elif time_frame == "day": 17 | return datetime.timedelta(days=1) 18 | elif time_frame == "week": 19 | return datetime.timedelta(weeks=1) 20 | raise ValueError("Invalid time frame") 21 | 22 | 23 | def time_ago(dt: datetime.datetime) -> str: 24 | current_time = datetime.datetime.utcnow() 25 | time_ago = current_time - dt 26 | 27 | # Convert the time difference to a readable string 28 | if time_ago < datetime.timedelta(minutes=1): 29 | time_ago_str = "just now" 30 | elif time_ago < datetime.timedelta(hours=1): 31 | minutes = int(time_ago.total_seconds() / 60) 32 | time_ago_str = f"{minutes} minutes ago" 33 | elif time_ago < datetime.timedelta(days=1): 34 | hours = int(time_ago.total_seconds() / 3600) 35 | time_ago_str = f"{hours} hours ago" 36 | else: 37 | days = time_ago.days 38 | time_ago_str = f"{days} days ago" 39 | 40 | return time_ago_str 41 | 42 | 43 | def timedelta(time_span: str) -> datetime.timedelta: 44 | hour_pattern = re.compile(r"(\d+)h") 45 | day_pattern = re.compile(r"(\d+)d") 46 | week_pattern = re.compile(r"(\d+)w") 47 | if m := hour_pattern.match(time_span): 48 | return datetime.timedelta(hours=int(m.group(1))) 49 | elif m := day_pattern.match(time_span): 50 | return datetime.timedelta(days=int(m.group(1))) 51 | elif m := week_pattern.match(time_span): 52 | return datetime.timedelta(weeks=int(m.group(1))) 53 | raise ValueError("Invalid time frame") 54 | 55 | 56 | class LinkStyle: 57 | def __init__(self, scheme: str | None = None): 58 | # ivory://acct/openURL?url= 59 | # {config.ConfigHandler.MASTO_BASE}/deck/@{toot.author}/{toot.toot_id} 60 | if scheme: 61 | self.scheme = st.radio("Link scheme:", ["Desktop", "Ivory", "Original"], index=1, horizontal=True) 62 | else: 63 | self.scheme = scheme 64 | 65 | def toot_url(self, toot: core.Toot) -> str: 66 | if self.scheme == "Desktop": 67 | return f"{config.ConfigHandler.MASTO_BASE}/@{toot.author}/{toot.toot_id}" 68 | elif self.scheme == "Ivory": 69 | encoded_url = urllib.parse.quote(toot.url) 70 | return f"ivory://acct/openURL?url={encoded_url}" 71 | elif self.scheme == "Original": 72 | return toot.url 73 | raise ValueError("Invalid scheme") 74 | 75 | def profile_url(self, toot: core.Toot) -> str: 76 | if self.scheme == "Desktop": 77 | return f"{config.ConfigHandler.MASTO_BASE}/@{toot.author}" 78 | elif self.scheme == "Ivory": 79 | # return f"ivory://@{toot.author}/profile" 80 | return f"ivory://acct/openURL?url={toot.profile_url}" 81 | elif self.scheme == "Original": 82 | return toot.profile_url 83 | raise ValueError("Invalid scheme") 84 | 85 | 86 | 87 | class TootCluster(pydantic.BaseModel): 88 | id: int 89 | name: str 90 | toots: list[core.Toot] 91 | 92 | 93 | class TootClusters(pydantic.BaseModel): 94 | clusters: list[TootCluster] 95 | 96 | @property 97 | def num_toots(self) -> int: 98 | return sum(len(c.toots) for c in self.clusters) 99 | 100 | @property 101 | def max_date(self) -> datetime.datetime: 102 | seq = [t.created_at for c in self.clusters for t in c.toots] 103 | return max(seq) if len(seq) > 0 else datetime.datetime.utcnow() 104 | 105 | @property 106 | def min_date(self) -> datetime.datetime: 107 | seq = [t.created_at for c in self.clusters for t in c.toots] 108 | return min(seq) if len(seq) > 0 else datetime.datetime.utcnow() 109 | 110 | 111 | def display_toot(toot: core.Toot, link_style: LinkStyle): 112 | with st.container(border=True): 113 | reply = "↩" if toot.is_reply else "" 114 | st.markdown(f""" 115 | {reply}{toot.display_name} @{toot.author} ({time_ago(toot.created_at)}) 116 | {toot.content} 117 | """, unsafe_allow_html=True) 118 | 119 | attachments = [f'' for a in toot.media_attachments] 120 | st.markdown(" ".join(attachments), unsafe_allow_html=True) 121 | 122 | cols = st.columns(4) 123 | with cols[0]: 124 | st.markdown(f"""🔗""", unsafe_allow_html=True) 125 | with cols[1]: 126 | if st.button("⭐️", key=f"star-{toot.id}"): 127 | toot.do_star() 128 | with cols[2]: 129 | if st.button("️🔁", key=f"boost-{toot.id}"): 130 | toot.do_boost() 131 | with cols[3]: 132 | if st.button("🪲", key=f"delete-{toot.id}"): 133 | import json 134 | print(json.dumps(toot.orig_dict, indent=2)) 135 | 136 | 137 | def all_toot_summary(toots: list[core.Toot]): 138 | latest_date = max(t.created_at for t in toots) 139 | earliest_date = min(t.created_at for t in toots) 140 | now = datetime.datetime.utcnow() 141 | msg = f"{len(toots)} toots from {time_ago(earliest_date)} to {time_ago(latest_date)}" 142 | if latest_date > now: 143 | st.warning(msg) 144 | else: 145 | st.info(msg) -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | My App 19 | 20 | 21 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 103 | 104 | 105 | 106 | 109 | 110 | 111 | 112 | -------------------------------------------------------------------------------- /make.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | function update_deps() { 4 | curl 'https://unpkg.com/htmx.org@latest' -o app/public/htmx.js 5 | } 6 | 7 | function run() { 8 | poetry run uvicorn --host 0.0.0.0 --port 8888 --reload --reload-include '*.html' --reload-include '*.css' fossil_mastdon.server:app 9 | } 10 | 11 | 12 | $1 "$@" 13 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "fossil-mastodon" 3 | version = "0.4.0-dev" 4 | description = "A mastodon reader client that uses embeddings to present a consolidated view of my mastodon timeline" 5 | authors = ["Tim Kellogg "] 6 | license = "MIT" 7 | readme = "README.md" 8 | include = ["**/*.css", "**/*.js", "**/*.html"] 9 | 10 | [tool.poetry.dependencies] 11 | python = "^3.10" 12 | requests = "^2.31.0" 13 | streamlit = "^1.29.0" 14 | scikit-learn = "^1.3.2" 15 | html2text = "^2020.1.16" 16 | tiktoken = "^0.5.2" 17 | python-dotenv = "^1.0.0" 18 | fastapi = "^0.105.0" 19 | jinja2 = "^3.1.2" 20 | uvicorn = "^0.25.0" 21 | python-multipart = "^0.0.6" 22 | llm = "^0.12" 23 | 24 | # You can use this same format for installing your own plugins from a different project 25 | [tool.poetry.plugins."fossil_mastodon.plugins"] 26 | topic_cluster = "fossil_mastodon.plugin_impl.topic_cluster:plugin" 27 | debug_button = "fossil_mastodon.plugin_impl.toot_debug:plugin" 28 | 29 | 30 | [tool.poetry.group.dev.dependencies] 31 | watchdog = "^3.0.0" 32 | watchfiles = "^0.21.0" 33 | 34 | 35 | [[tool.poetry.source]] 36 | name = "PyPI" 37 | priority = "primary" 38 | 39 | [build-system] 40 | requires = ["poetry-core"] 41 | build-backend = "poetry.core.masonry.api" 42 | --------------------------------------------------------------------------------