├── README.md ├── RecursiveGPT.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | RecursiveGPT 2 | 3 | Process large text files and documents with ChatGPT (more APIs coming soon\*). Requires [Python 3](https://www.python.org/downloads/) and an [OpenAI API key](https://platform.openai.com/account/api-keys). 4 | 5 | ## Use: 6 | 7 | Currently command-line only. 8 | 9 | ```bash 10 | git clone https://github.com/jckpn/RecursiveGPT.git 11 | cd RecursiveGPT 12 | pip install -r requirements.txt 13 | python RecursiveGPT.py 14 | ``` 15 | Then just follow the prompts as given in the termimal. 16 | 17 | ## How it works 18 | 19 | This script works by splitting the file into chunks within the context window of ChatGPT (currently limited to 4096 tokens). 20 | You simply specify a prompt, and it'll get prepended to the start of each chunk. 21 | The ChatGPT outputs are recursively added to an output text file. 22 | 23 | ## Examples 24 | 25 | Your prompts could be anything like: 26 | - _"Summarise the following lecture notes."_ 27 | - _"Proofread the following, providing your response in the form \[original text -> revised text\]."_ 28 | - _"Rewrite the following in the style of a pirate."_ 29 | - _"Convert the following scientific paper into a child-friendly overview."_ 30 | - _"Task: For each mention of a dog breed, state the breed and describe where in the text it occurs."_ 31 | 32 | The prompt below was _"Convert these lecture captions to concise revision notes."_ 33 | Screenshot 2023-05-20 at 16 09 06 34 | 35 | I made this to help me with tasks like proofreading and lecture summaries, but it works well for a range of tasks. 36 | 37 | \**I have very little free time at the moment due to my MSc and other jobs – pull requests would be appreciated!* 38 | 39 | (Privacy note: No data is sent anywhere except for interacting with the ChatGPT API.) 40 | -------------------------------------------------------------------------------- /RecursiveGPT.py: -------------------------------------------------------------------------------- 1 | import openai 2 | import os 3 | from tqdm import tqdm 4 | import sys 5 | 6 | 7 | def process_chunk(prompt, chunk, output_path, model): 8 | with open(output_path, 'a') as output_file: 9 | messages = [{'role': 'system', 'content': 'I am a helpful assistant.'}, 10 | {'role': 'user', 'content': (prompt + ' '.join(chunk))}] 11 | response = openai.ChatCompletion.create( 12 | model=model, 13 | messages=messages) 14 | response = response['choices'][0]['message']['content'] 15 | output_file.write(response + '\n\n') 16 | 17 | def split_file_to_chunks(prompt, input_path, output_path, chunk_size, model): 18 | with open(input_path, 'r') as file: 19 | content = file.read() 20 | words = content.split() 21 | 22 | # confirm with user 23 | est_tokens = len(words)/0.75 24 | cost_per_token = 0.0002/1000 25 | est_cost = est_tokens*cost_per_token 26 | num_chunks = round(len(words)/chunk_size) 27 | est_time = est_tokens/4000*1.5 # around 1.5 mins per 4000 tokens 28 | 29 | print(f'\nEstimated tokens required: {est_tokens:.1f} ({num_chunks} prompts with {chunk_size} words each)') 30 | print(f'Estimated cost: between ${est_cost:.2f}-${est_cost*2:.2f}') 31 | print(f'Estimated time: {est_time:.1f} minutes') 32 | print(f'Press RETURN to continue or exit (Ctrl+Z) to cancel.') 33 | input() 34 | 35 | print(f'Writing full output to file {output_path}...') 36 | 37 | for i in tqdm(range(0, len(words), chunk_size)): 38 | chunk = words[i:i+chunk_size] 39 | full_prompt = prompt + f'\n(Note: the following is an extract, words {i}-{i+chunk_size} of the {len(words)} word document.)\n\n' 40 | print(full_prompt) 41 | process_chunk(full_prompt, chunk, output_path, model) 42 | 43 | print(f'Finished writing to {output_path}.') 44 | 45 | 46 | if __name__ == '__main__': 47 | api_key = input('Enter your OpenAI API key: ') 48 | openai.api_key = api_key 49 | # TODO: add checks for key validity 50 | 51 | model = input('Enter the model to be used (default: gpt-3.5-turbo, available: gpt-4): ') 52 | if model == '': 53 | model = 'gpt-3.5-turbo' 54 | 55 | input_path = input('Enter the input path to the text file to process: ') 56 | if os.path.exists(input_path) == False: 57 | print(f'`{input_path}` can\'t be found.') 58 | exit() 59 | 60 | prompt = input('Enter the prompt to be prepended to each chunk of text: ') 61 | if prompt == '': 62 | print('A prompt is required to use this script.') 63 | exit() 64 | 65 | output_path = input('Enter the output path to the text file to write to (default: output.txt): ') 66 | if output_path == '': 67 | output_path = 'output.txt' 68 | 69 | chunk_size = input('Enter the number of words per prompt (default: 2500): ') 70 | if chunk_size == '': 71 | chunk_size = 2500 72 | else: 73 | chunk_size = int(chunk_size) 74 | if chunk_size < 1: 75 | print('Chunk size must be greater than 0.') 76 | exit() 77 | elif chunk_size > 3000: 78 | print('Chunk sizes greater than ~3000 are likely to fail due to model limitations. Continue? (y/n)') 79 | if input() != 'y': 80 | exit() 81 | 82 | split_file_to_chunks(prompt, input_path, output_path, chunk_size, model) 83 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | openai 2 | tqdm 3 | --------------------------------------------------------------------------------