├── README.md
├── RecursiveGPT.py
└── requirements.txt


/README.md:
--------------------------------------------------------------------------------
 1 | RecursiveGPT
 2 | 
 3 | Process large text files and documents with ChatGPT (more APIs coming soon\*). Requires [Python 3](https://www.python.org/downloads/) and an [OpenAI API key](https://platform.openai.com/account/api-keys).
 4 | 
 5 | ## Use:
 6 | 
 7 | Currently command-line only.
 8 | 
 9 | ```bash
10 | git clone https://github.com/jckpn/RecursiveGPT.git
11 | cd RecursiveGPT
12 | pip install -r requirements.txt
13 | python RecursiveGPT.py
14 | ```
15 | Then just follow the prompts as given in the termimal.
16 | 
17 | ## How it works
18 | 
19 | This script works by splitting the file into chunks within the context window of ChatGPT (currently limited to 4096 tokens).
20 | You simply specify a prompt, and it'll get prepended to the start of each chunk.
21 | The ChatGPT outputs are recursively added to an output text file.
22 | 
23 | ## Examples
24 | 
25 | Your prompts could be anything like:
26 | - _"Summarise the following lecture notes."_
27 | - _"Proofread the following, providing your response in the form \[original text -> revised text\]."_
28 | - _"Rewrite the following in the style of a pirate."_
29 | - _"Convert the following scientific paper into a child-friendly overview."_
30 | - _"Task: For each mention of a dog breed, state the breed and describe where in the text it occurs."_
31 | 
32 | The prompt below was _"Convert these lecture captions to concise revision notes."_
33 | <img width="1033" alt="Screenshot 2023-05-20 at 16 09 06" src="https://github.com/jckpn/RecursiveGPT/assets/14837124/8b1a2fde-f11b-4ef7-9e27-fde418ee1418">
34 | 
35 | I made this to help me with tasks like proofreading and lecture summaries, but it works well for a range of tasks.
36 | 
37 | \**I have very little free time at the moment due to my MSc and other jobs – pull requests would be appreciated!*
38 | 
39 | (Privacy note: No data is sent anywhere except for interacting with the ChatGPT API.)
40 | 


--------------------------------------------------------------------------------
/RecursiveGPT.py:
--------------------------------------------------------------------------------
 1 | import openai
 2 | import os
 3 | from tqdm import tqdm
 4 | import sys
 5 | 
 6 | 
 7 | def process_chunk(prompt, chunk, output_path, model):
 8 |     with open(output_path, 'a') as output_file:
 9 |         messages = [{'role': 'system', 'content': 'I am a helpful assistant.'},
10 |                 {'role': 'user', 'content': (prompt + ' '.join(chunk))}]
11 |         response = openai.ChatCompletion.create(
12 |             model=model,
13 |             messages=messages)
14 |         response = response['choices'][0]['message']['content']
15 |         output_file.write(response + '\n\n')
16 | 
17 | def split_file_to_chunks(prompt, input_path, output_path, chunk_size, model):
18 |     with open(input_path, 'r') as file:
19 |         content = file.read()
20 |         words = content.split()
21 | 
22 |         # confirm with user
23 |         est_tokens = len(words)/0.75
24 |         cost_per_token = 0.0002/1000
25 |         est_cost = est_tokens*cost_per_token
26 |         num_chunks = round(len(words)/chunk_size)
27 |         est_time = est_tokens/4000*1.5 # around 1.5 mins per 4000 tokens
28 |         
29 |         print(f'\nEstimated tokens required: {est_tokens:.1f} ({num_chunks} prompts with {chunk_size} words each)')
30 |         print(f'Estimated cost: between ${est_cost:.2f}-${est_cost*2:.2f}')
31 |         print(f'Estimated time: {est_time:.1f} minutes')
32 |         print(f'Press RETURN to continue or exit (Ctrl+Z) to cancel.')
33 |         input()
34 | 
35 |         print(f'Writing full output to file {output_path}...')
36 | 
37 |         for i in tqdm(range(0, len(words), chunk_size)):
38 |             chunk = words[i:i+chunk_size]
39 |             full_prompt = prompt + f'\n(Note: the following is an extract, words {i}-{i+chunk_size} of the {len(words)} word document.)\n\n'
40 |             print(full_prompt)
41 |             process_chunk(full_prompt, chunk, output_path, model)
42 | 
43 |     print(f'Finished writing to {output_path}.')
44 | 
45 | 
46 | if __name__ == '__main__':
47 |     api_key = input('Enter your OpenAI API key: ')
48 |     openai.api_key = api_key
49 |     # TODO: add checks for key validity
50 | 
51 |     model = input('Enter the model to be used (default: gpt-3.5-turbo, available: gpt-4): ')
52 |     if model == '':
53 |         model = 'gpt-3.5-turbo'
54 | 
55 |     input_path = input('Enter the input path to the text file to process: ')
56 |     if os.path.exists(input_path) == False:
57 |         print(f'`{input_path}` can\'t be found.')
58 |         exit()
59 | 
60 |     prompt = input('Enter the prompt to be prepended to each chunk of text: ')
61 |     if prompt == '':
62 |         print('A prompt is required to use this script.')
63 |         exit()
64 | 
65 |     output_path = input('Enter the output path to the text file to write to (default: output.txt): ')
66 |     if output_path == '':
67 |         output_path = 'output.txt'
68 |         
69 |     chunk_size = input('Enter the number of words per prompt (default: 2500): ')
70 |     if chunk_size == '':
71 |         chunk_size = 2500
72 |     else:
73 |         chunk_size = int(chunk_size)
74 |     if chunk_size < 1:
75 |         print('Chunk size must be greater than 0.')
76 |         exit()
77 |     elif chunk_size > 3000:
78 |         print('Chunk sizes greater than ~3000 are likely to fail due to model limitations. Continue? (y/n)')
79 |         if input() != 'y':
80 |             exit()
81 |         
82 |     split_file_to_chunks(prompt, input_path, output_path, chunk_size, model)
83 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | openai
2 | tqdm
3 | 


--------------------------------------------------------------------------------