├── .gitignore ├── Dockerfile ├── README.md ├── main.py ├── requirements.txt └── templates └── index.html /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | .env 3 | __pycache__ 4 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Use an official Python runtime as a parent image 2 | FROM python:3.9-slim 3 | 4 | # Set the working directory 5 | WORKDIR /app 6 | 7 | # Copy requirements.txt into the container 8 | COPY requirements.txt . 9 | 10 | # Install any needed packages specified in requirements.txt 11 | RUN pip install --trusted-host pypi.python.org -r requirements.txt 12 | 13 | # Copy the rest of the application code into the container 14 | COPY . . 15 | 16 | # Make port 8000 available to the world outside this container 17 | EXPOSE 8000 18 | 19 | # Define environment variable 20 | ENV PYTHONUNBUFFERED=1 21 | 22 | # Run the command to start the FastAPI app 23 | CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **DEPRECATED**: This repository is still useful for context and to see an initial implementation of this idea. However, this work is being continued in https://github.com/developmentseed/haystac with a more "proper" structure using LangChain 2 | 3 | ### Implement the ReAct pattern to connect an LLM with a STAC API endpoint 4 | 5 | This is inspired by Simon Willison's blog-post: https://til.simonwillison.net/llms/python-react-pattern 6 | 7 | The idea here is to develop a natural language interface to a STAC API endpoint, currently the Microsoft Planetary Computer STAC Catalog. 8 | 9 | The code is currently very rudimentary and experimental, but already shows promising results. 10 | 11 | ### How to run 12 | 13 | Create an environment variable called OPENAI_API_KEY with your OpenAI API key. 14 | 15 | ``` 16 | python 17 | 18 | from main import query 19 | 20 | > query("Can you get me satellite imagery for Seattle for 10th December, 2018?") 21 | 22 | Observation: The STAC query returns a list of assets that are available for the given parameters of the bounding box and datetime, which includes imagery from NOAA GOES satellite (GLM-L2-LCFA/2018/345/00) as well as MODIS collection 6.1 (MYD21A2.A2018345.h09v04.061.2021350231530) which has several different assets available, including metadata and various thermal bands. The rendered preview image can be viewed at https://planetarycomputer.microsoft.com/api/data/v1/item/preview.png?collection=modis-21A2-061&item=MYD21A2.A2018345.h09v04.061.2021350231530&assets=LST_Day_1KM&tile_format=png&colormap_name=jet&rescale=255%2C310&format=png 23 | ``` 24 | 25 | In the above example, ChatGPT constructs queries to Wikipedia, gets the bounding box for Seattle, and uses that to construct a query to the STAC API for the bounding box and datetime requests. It currently only processes the first two results returned, but this can be easily improved. 26 | 27 | 28 | ### TODO 29 | 30 | This is a very rough quick and dirty PoC. To improve this: 31 | 32 | - Move from wikipedia to using a real geocoder to fetch bounding boxes for a place 33 | - Allow it to use more complex STAC search functionality 34 | - Format the STAC search result object more appropriately to send back to ChatGPT for it to interpret results. 35 | - Augment ChatGPT's natural language answer with all the links, etc. from the actual STAC API response. 36 | 37 | 38 | 39 | ## Server setup 40 | This is now wrapped in a lightweight FastAPI application 41 | 42 | * `docker build -t fastapi-chatgpt-app .` 43 | * `docker run -p 8000:8000 -e OPENAI_API_KEY= fastapi-chatgpt-app` 44 | * Send a request like this: `http://localhost:8000/chatgpt?prompt=%22find%20me%20satellite%20imagery%20in%20Bangalore%20for%20December%2014,%202017%22` 45 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | # This code is Apache 2 licensed: 2 | # https://www.apache.org/licenses/LICENSE-2.0 3 | import openai 4 | import re 5 | import httpx 6 | from pystac_client import Client 7 | from fastapi import FastAPI 8 | from fastapi.staticfiles import StaticFiles 9 | 10 | from opencage.geocoder import OpenCageGeocode 11 | import json 12 | import os 13 | 14 | app = FastAPI() 15 | 16 | if not 'OPENAI_API_KEY' in os.environ: 17 | raise Exception("OPENAI_API_KEY must be defined in your environment") 18 | 19 | if not 'OPENCAGE_API_KEY' in os.environ: 20 | raise Exception("OPENCAGE_API_KEY is required") 21 | 22 | openai.api_key = os.environ['OPENAI_API_KEY'] 23 | stac_endpoint = "https://planetarycomputer.microsoft.com/api/stac/v1" 24 | 25 | geocoder_key = os.environ['OPENCAGE_API_KEY'] 26 | geocoder = OpenCageGeocode(geocoder_key) 27 | 28 | app.mount("/templates", StaticFiles(directory="templates"), name="templates") 29 | 30 | @app.get("/status") 31 | def health(): 32 | return {"status": "success"} 33 | 34 | @app.get("/chatgpt") 35 | async def chatgpt(prompt: str): 36 | return await query(prompt) 37 | 38 | 39 | class ChatBot: 40 | def __init__(self, system=""): 41 | self.system = system 42 | self.messages = [] 43 | if self.system: 44 | self.messages.append({"role": "system", "content": system}) 45 | 46 | async def __call__(self, message): 47 | self.messages.append({"role": "user", "content": message}) 48 | result = await self.execute() 49 | self.messages.append({"role": "assistant", "content": result}) 50 | return result 51 | 52 | async def execute(self): 53 | completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=self.messages) 54 | # Uncomment this to print out token usage each time, e.g. 55 | # {"completion_tokens": 86, "prompt_tokens": 26, "total_tokens": 112} 56 | # print(completion.usage) 57 | return completion.choices[0].message.content 58 | 59 | prompt = """ 60 | You run in a loop of Thought, Action, PAUSE, Observation. 61 | At the end of the loop you output an Answer 62 | Use Thought to describe your thoughts about the question you have been asked. 63 | Use Action to run one of the actions available to you - then return PAUSE. 64 | Observation will be the result of running those actions. 65 | 66 | The questions will involve getting satellite imagery out of a STAC catalog. 67 | 68 | To resolve the question, you have the following tools available that you can use. 69 | 70 | Your available actions are: 71 | 72 | calculate: 73 | e.g. calculate: 4 * 7 / 3 74 | Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary 75 | 76 | wikipedia: 77 | e.g. wikipedia: Mumbai 78 | Returns a summary from searching Wikipedia 79 | 80 | stac: 81 | e.g. stac: bbox=[-73.21, 43.99, -73.12, 44.05] && datetime=['2019-01-01T00:00:00Z', '2019-01-02T00:00:00Z'] 82 | Will query the Microsoft Planetary Computer STAC endpoint for STAC records for that bbox and datetime and return a JSON representation of the item assets returned from the STAC API. 83 | Please ensure the STAC query is entered exactly as above, with a bbox representing the lat / lng extents of the area. 84 | 85 | and the datetime representing timestamps for start and end time to search the catalog within. Always return the rendered preview URL from the items. 86 | 87 | Please remember that these are your only three available actions. Do not attempt to put any word after "Action: " other than wikipedia, calculate or stac. THIS IS A HARD RULE. DO NOT BREAK IT AT ANY COST. DO NOT MAKE UP YOUR OWN ACTIONS. 88 | 89 | Always look things up on Wikipedia if you have the opportunity to do so. 90 | 91 | Example session: 92 | 93 | Question: Can you point me to satellite images for 2019 January, for the capital of France? 94 | Thought: I should look up France on Wikipedia, find its capital, and find its bounding box extent. 95 | Action: wikipedia: France 96 | PAUSE 97 | 98 | You will be called again with this: 99 | 100 | Observation: France is a country. The capital is Paris. Its bbox is [27, 54, 63, 32.5] 101 | 102 | You then output: 103 | 104 | Thought: I should now query the STAC catalog to fetch data about satellite images of Paris. 105 | 106 | Action: stac: bbox=[27, 54, 63, 32.5] && datetime=['2019-01-01T00:00:00Z', '2019-01-02T00:00:00Z'] 107 | PAUSE 108 | 109 | You will be called again with the output from the STAC query as JSON. Use that to give the user information about what is available on STAC for their query. 110 | 111 | """.strip() 112 | 113 | 114 | action_re = re.compile('^Action: (\w+): (.*)$') 115 | 116 | async def query(question, max_turns=5): 117 | i = 0 118 | bot = ChatBot(prompt) 119 | next_prompt = question 120 | while i < max_turns: 121 | i += 1 122 | result = await bot(next_prompt) 123 | print(result) 124 | actions = [action_re.match(a) for a in result.split('\n') if action_re.match(a)] 125 | if actions: 126 | # There is an action to run 127 | action, action_input = actions[0].groups() 128 | if action not in known_actions: 129 | raise Exception("Unknown action: {}: {}".format(action, action_input)) 130 | print(" -- running {} {}".format(action, action_input)) 131 | observation = await known_actions[action](action_input) 132 | print("Observation:", observation) 133 | 134 | # If the action is querying the STAC API, just return the results, don't re-prompt 135 | if action == 'stac': 136 | return observation 137 | else: 138 | next_prompt = "Observation: {}".format(observation) 139 | else: 140 | return result 141 | 142 | 143 | async def stac(q): 144 | bbox_match = re.search(r'bbox=\[(.*?)\]', q) 145 | datetime_match = re.search(r'datetime=\[(.*?)\]', q) 146 | 147 | # Check if bbox and datetime arrays are found 148 | if bbox_match and datetime_match: 149 | bbox_str = bbox_match.group(1) 150 | datetime_str = datetime_match.group(1) 151 | 152 | # Remove spaces after commas, if any 153 | bbox_str = bbox_str.replace(', ', ',') 154 | datetime_str = datetime_str.replace(', ', ',') 155 | 156 | # Convert bbox and datetime arrays to lists 157 | bbox = [float(x) if '.' in x else int(x) for x in bbox_str.split(',')] 158 | datetime = [x.strip('\'') for x in datetime_str.split(',')] 159 | else: 160 | raise Exception("ChatGPT did a weirdo", q) 161 | 162 | print('datetime', datetime) 163 | api = Client.open(stac_endpoint) 164 | 165 | results = api.search( 166 | max_items=10, 167 | bbox=bbox, 168 | datetime=datetime, 169 | ) 170 | return { 171 | 'stac': results.item_collection_as_dict(), 172 | 'bbox': bbox, 173 | 'datetime': datetime 174 | } 175 | 176 | async def wikipedia(q): 177 | async with httpx.AsyncClient() as client: 178 | response = await client.get("https://en.wikipedia.org/w/api.php", params={ 179 | "action": "query", 180 | "list": "search", 181 | "srsearch": q, 182 | "format": "json" 183 | }) 184 | return response.json()["query"]["search"][0]["snippet"] 185 | 186 | async def geocode(q): 187 | async with httpx.AsyncClient() as client: 188 | response = geocoder.geocode(q, no_annotations='1') 189 | if response and response['results']: 190 | return response['results'][0]['bounds'] 191 | else: 192 | return None 193 | 194 | async def calculate(what): 195 | return eval(what) 196 | 197 | known_actions = { 198 | "wikipedia": wikipedia, 199 | "calculate": calculate, 200 | "stac": stac 201 | } 202 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.4 2 | aiosignal==1.3.1 3 | anyio==3.6.2 4 | async-timeout==4.0.2 5 | attrs==22.2.0 6 | backoff==2.2.1 7 | certifi==2022.12.7 8 | cffi==1.15.1 9 | charset-normalizer==3.1.0 10 | click==8.1.3 11 | cryptography==40.0.1 12 | fastapi==0.95.0 13 | frozenlist==1.3.3 14 | h11==0.14.0 15 | httpcore==0.16.3 16 | httpx==0.23.3 17 | idna==3.4 18 | multidict==6.0.4 19 | openai==0.27.4 20 | opencage==2.1.0 21 | pycparser==2.21 22 | pydantic==1.10.7 23 | pyOpenSSL==23.1.1 24 | pystac==1.7.2 25 | pystac-client==0.6.1 26 | python-dateutil==2.8.2 27 | requests==2.28.2 28 | rfc3986==1.5.0 29 | six==1.16.0 30 | sniffio==1.3.0 31 | starlette==0.26.1 32 | tqdm==4.65.0 33 | typing_extensions==4.5.0 34 | urllib3==1.26.15 35 | uvicorn==0.21.1 36 | yarl==1.8.2 37 | -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 58 | 59 | 60 |
61 |
62 | 65 | 66 |
67 |
68 |
69 |
70 |
71 | 72 | 178 | 179 | --------------------------------------------------------------------------------