├── LICENSE ├── README.md ├── main.py ├── requirements.txt └── templates └── index.html /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Yohei Nakajima 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # airtable_to_graph 2 | 3 | This Python script retrieves data from an Airtable base, identifies relational columns, and builds a knowledge graph using the retrieved data. The knowledge graph is then visualized using the pyvis library and displayed in a web application built with Flask. 4 | 5 | It works well with small Airtables but I'm having some issues when the Airtable is too big. 6 | 7 | ## Prerequisites 8 | 9 | Before running the script, ensure that you have the following: 10 | 11 | - Python 3.x installed 12 | - Required Python packages: `requests`, `flask`, `pyairtable`, `pyvis` 13 | - Airtable API key (see instructions below) 14 | 15 | ## Installation 16 | 17 | 1. Clone the repository: 18 | 19 | ``` 20 | git clone https://github.com/yoheinakajima/airtable_to_graph.git 21 | ``` 22 | 23 | 2. Navigate to the project directory: 24 | 25 | ``` 26 | cd airtable_to_graph 27 | ``` 28 | 29 | 3. Install the required Python packages: 30 | 31 | ``` 32 | pip install -r requirements.txt 33 | ``` 34 | 35 | ## Obtaining Airtable API Key 36 | 37 | To retrieve data from Airtable, you need to provide your Airtable API key. Here's how you can find your API key: 38 | 39 | 1. Log in to your Airtable account. 40 | 2. Go to the Developer Hub settings by clicking on your profile picture in the top-right corner and selecting "Developer Hub". 41 | 3. In the Personal Key page, click on "Create New Token" button on top right. 42 | 4. Give the token a name, and give it the Scopes permissions of data.records:read and schema.bases:read. 43 | 5. Add all bases, or select the base you want to connect. 44 | 6. Click "Create Token" button and copy the generated API key. 45 | 46 | ## Configuration 47 | 48 | 1. Set the following environment variables with your Airtable API key and base ID: 49 | 50 | ``` 51 | export AIRTABLE_API_KEY=your_api_key_here 52 | ``` 53 | 54 | Replace `your_api_key_here` with your actual Airtable API key and `your_base_id_here` with the ID of the Airtable base you want to retrieve data from. 55 | 56 | ## Usage 57 | 58 | 1. Run the script: 59 | 60 | ``` 61 | python main.py 62 | ``` 63 | 64 | 2. Open a web browser and navigate to `http://localhost:5000` to view the knowledge graph. 65 | 66 | ## How It Works 67 | 68 | 1. The script retrieves all table names from the specified Airtable base using the Airtable API. 69 | 2. It retrieves data from each table using parallel processing with `ThreadPoolExecutor`. 70 | 3. The script identifies relational columns in the retrieved data by looking for fields that contain a list of strings starting with "rec". 71 | 4. It builds a knowledge graph using the `pyvis` library, representing each record as a node and creating edges based on the relational columns. 72 | 5. The knowledge graph is saved as an HTML file in the `static` directory. 73 | 6. The script uses Flask to render the knowledge graph in a web application, which can be accessed at `http://localhost:5000`. 74 | 75 | ## Customization 76 | 77 | - You can customize the appearance of the knowledge graph by modifying the `height`, `width`, `bgcolor`, and `font_color` parameters in the `build_knowledge_graph` function. 78 | - If you want to change the shape of the nodes, you can modify the `shape` parameter in the `net.add_node` function. 79 | 80 | ## Contributing 81 | 82 | Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository. 83 | 84 | ## License 85 | 86 | This project is licensed under the [MIT License](LICENSE). 87 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from flask import Flask, render_template 3 | from pyairtable import Table 4 | from pyvis.network import Network 5 | import os 6 | from concurrent.futures import ThreadPoolExecutor 7 | 8 | app = Flask(__name__) 9 | 10 | # Airtable configuration 11 | AIRTABLE_API_KEY = os.environ['AIRTABLE_API_KEY'] 12 | AIRTABLE_BASE_ID = '' # put your airtable base here 13 | 14 | # Retrieve data from Airtable 15 | def get_airtable_data(table_name): 16 | table = Table(AIRTABLE_API_KEY, AIRTABLE_BASE_ID, table_name) 17 | return table.all() 18 | 19 | # Identify relational columns 20 | def find_relational_columns(tables): 21 | print("finding relational columns") 22 | relational_columns = {} 23 | for table_name, records in tables.items(): 24 | for record in records: 25 | for field, value in record['fields'].items(): 26 | if isinstance(value, list) and len(value) > 0 and isinstance(value[0], str) and value[0].startswith('rec'): 27 | relational_columns.setdefault(table_name, set()).add(field) 28 | break # Break the loop if a relational column is found for the table 29 | return relational_columns 30 | 31 | # Build knowledge graph 32 | def build_knowledge_graph(tables, relational_columns): 33 | print("start building knowledge graphs") 34 | net = Network(height='1000px', width='100%', bgcolor='#222222', font_color='black') 35 | nodes = {} 36 | 37 | for table_name, records in tables.items(): 38 | for record in records: 39 | node_id = record['id'] 40 | node_label = record['fields'].get('Name', '') 41 | nodes[node_id] = (node_label, table_name) 42 | 43 | for node_id, (node_label, table_name) in nodes.items(): 44 | net.add_node(node_id, label=node_label, title=node_label, group=table_name, shape='box') 45 | 46 | for table_name, columns in relational_columns.items(): 47 | for column in columns: 48 | for record in tables[table_name]: 49 | if column in record['fields']: 50 | for related_id in record['fields'][column]: 51 | net.add_edge(record['id'], related_id, title=column) 52 | 53 | return net 54 | 55 | @app.route('/') 56 | def index(): 57 | # Get all table names from the base schema 58 | url = f"https://api.airtable.com/v0/meta/bases/{AIRTABLE_BASE_ID}/tables" 59 | headers = {'Authorization': f'Bearer {AIRTABLE_API_KEY}'} 60 | response = requests.get(url, headers=headers) 61 | base_schema = response.json() 62 | print(base_schema) 63 | table_names = [table['name'] for table in base_schema['tables']] 64 | 65 | # Retrieve data from Airtable using parallel processing 66 | with ThreadPoolExecutor() as executor: 67 | tables = {table_name: data for table_name, data in zip(table_names, executor.map(get_airtable_data, table_names))} 68 | 69 | # Identify relational columns 70 | relational_columns = find_relational_columns(tables) 71 | 72 | # Build knowledge graph 73 | net = build_knowledge_graph(tables, relational_columns) 74 | 75 | # Save graph as HTML file 76 | net.save_graph('static/graph.html') 77 | 78 | return render_template('index.html') 79 | 80 | if __name__ == '__main__': 81 | app.run(debug=True) 82 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests==2.28.2 2 | Flask==2.3.2 3 | pyairtable==1.2.0 4 | pyvis==0.3.2 5 | -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 |