├── .env ├── .DS_Store ├── Public ├── .DS_Store ├── index.html ├── style.css ├── chat-interface.html ├── clone-creation.html ├── script.js └── chat-history-upload.html ├── uploads └── .DS_Store ├── cleaned_chat_history ├── .DS_Store └── .67928bae88857e758791053985b5f450.json.jsonl.swp ├── LICENSE ├── chat_with_model.py ├── clean-chat-history.py ├── telegram-clean-chat-history.py ├── fine-tune.py ├── instagram-messenger-clean-chat-history.py ├── app.js └── README.md /.env: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY=your_secret_key_here -------------------------------------------------------------------------------- /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kohhnagata/clone-factory-v1/HEAD/.DS_Store -------------------------------------------------------------------------------- /Public/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kohhnagata/clone-factory-v1/HEAD/Public/.DS_Store -------------------------------------------------------------------------------- /uploads/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kohhnagata/clone-factory-v1/HEAD/uploads/.DS_Store -------------------------------------------------------------------------------- /cleaned_chat_history/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kohhnagata/clone-factory-v1/HEAD/cleaned_chat_history/.DS_Store -------------------------------------------------------------------------------- /cleaned_chat_history/.67928bae88857e758791053985b5f450.json.jsonl.swp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kohhnagata/clone-factory-v1/HEAD/cleaned_chat_history/.67928bae88857e758791053985b5f450.json.jsonl.swp -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | WTFPL License 2 | Do What The Fuck You Want To Public License 3 | Copyright (c) 2024 kohei.eth 4 | 5 | Everyone is permitted to copy and distribute verbatim or modified 6 | copies of this license document, and changing it is allowed as long 7 | as the name is changed. 8 | 9 | DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE 10 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 11 | 12 | 0. You just DO WHAT THE FUCK YOU WANT TO. -------------------------------------------------------------------------------- /Public/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Entry Page 7 | 8 | 9 | 10 |

11 |

Clone Factory v1

12 |

13 | 14 | 15 |

16 |

17 | 18 | -------------------------------------------------------------------------------- /Public/style.css: -------------------------------------------------------------------------------- 1 | body { 2 | display: flex; 3 | margin: 0; 4 | } 5 | 6 | .side-menu { 7 | width: 200px; 8 | height: 100vh; 9 | background-color: #f0f0f0; 10 | padding: 20px; 11 | } 12 | 13 | .side-menu a { 14 | display: block; 15 | margin-bottom: 10px; 16 | text-decoration: none; 17 | color: #333; 18 | } 19 | 20 | .content { 21 | flex-grow: 1; 22 | padding: 20px; 23 | } 24 | 25 | #chat-container { 26 | width: 400px; 27 | margin: auto; 28 | border: 1px solid #ccc; 29 | padding: 20px; 30 | } 31 | 32 | #chat-box { 33 | height: 300px; 34 | overflow-y: auto; 35 | border: 1px solid #000; 36 | margin-bottom: 10px; 37 | padding: 5px; 38 | } 39 | 40 | #chat-input { 41 | width: calc(100% - 60px); 42 | margin-right: 10px; 43 | } 44 | 45 | #send-btn { 46 | width: 50px; 47 | } 48 | 49 | .message { 50 | margin: 5px 0; 51 | padding: 5px; 52 | border: 1px solid #ddd; 53 | } 54 | 55 | -------------------------------------------------------------------------------- /chat_with_model.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify, render_template 2 | import os 3 | from openai import OpenAI 4 | from flask_cors import CORS 5 | 6 | client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) 7 | 8 | template_dir = os.path.join(os.path.dirname(__file__), 'Public') 9 | app = Flask(__name__, template_folder=template_dir) 10 | CORS(app) 11 | 12 | @app.route('/') 13 | def index(): 14 | return render_template('chat-interface.html') 15 | 16 | @app.route('/chat', methods=['POST']) 17 | def chat(): 18 | model_name = request.form['model_name'] 19 | user_input = request.form['user_input'] 20 | 21 | messages = [{'role': 'user', 'content': user_input}] 22 | 23 | try: 24 | response = client.chat.completions.create(model=model_name, messages=messages) 25 | return jsonify({'response': response.choices[0].message.content.strip()}) 26 | except Exception as e: 27 | return jsonify({'response': 'Error: ' + str(e)}) 28 | 29 | if __name__ == '__main__': 30 | app.run(debug=True) 31 | -------------------------------------------------------------------------------- /Public/chat-interface.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Chat with your Model 7 | 8 | 9 | 29 | 30 | 31 | 37 |

38 |

Chat with Your Clone

39 |

40 |

41 | 42 | 43 | 44 |

45 | 46 | 47 | -------------------------------------------------------------------------------- /Public/clone-creation.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Create Your Clone 5 | 6 | 7 | 8 | 14 |

15 |

Create Your Clone

16 |

17 | 18 |

19 |

20 |

21 |

22 |

Side Notes:

23 |

Click the button above and monitor progress at https://platform.openai.com/finetune
If no training appears on the OpenAI Platform, check your terminal for errors

27 |

28 |

29 |

30 | 31 | 49 | 50 | 51 | -------------------------------------------------------------------------------- /Public/script.js: -------------------------------------------------------------------------------- 1 | document.getElementById('setName').addEventListener('click', function() { 2 | const chatName = document.getElementById('chatName').value; 3 | fetch('/api/set-name', { 4 | method: 'POST', 5 | headers: { 6 | 'Content-Type': 'application/json', 7 | }, 8 | body: JSON.stringify({ name: chatName }), 9 | }) 10 | .then(response => response.json()) 11 | .then(data => { 12 | alert('Chat name set successfully!'); 13 | }) 14 | .catch((error) => { 15 | console.error('Error:', error); 16 | }); 17 | }); 18 | 19 | document.getElementById('logout').addEventListener('click', function() { 20 | fetch('/api/logout') 21 | .then(() => { 22 | window.location.href = '/'; 23 | }) 24 | .catch((error) => { 25 | console.error('Error:', error); 26 | }); 27 | }); 28 | 29 | document.getElementById('deleteAccount').addEventListener('click', function() { 30 | if (confirm('Are you sure you want to delete your account? This action cannot be undone.')) { 31 | fetch('/api/delete-account') 32 | .then(() => { 33 | window.location.href = '/'; 34 | }) 35 | .catch((error) => { 36 | console.error('Error:', error); 37 | }); 38 | } 39 | }); 40 | 41 | document.getElementById('send-btn').addEventListener('click', function() { 42 | const input = document.getElementById('chat-input'); 43 | const message = input.value.trim(); 44 | if (message) { 45 | displayMessage(message, 'You'); 46 | sendMessageToServer(message); 47 | } 48 | input.value = ''; 49 | }); 50 | 51 | function displayMessage(message, sender) { 52 | const chatBox = document.getElementById('chat-box'); 53 | const msgElement = document.createElement('div'); 54 | msgElement.classList.add('message'); 55 | msgElement.textContent = `${sender}: ${message}`; 56 | chatBox.appendChild(msgElement); 57 | } 58 | 59 | function sendMessageToServer(message) { 60 | fetch('/send-message', { 61 | method: 'POST', 62 | headers: { 63 | 'Content-Type': 'application/json', 64 | }, 65 | body: JSON.stringify({ message }), 66 | }) 67 | .then(response => response.json()) 68 | .then(data => { 69 | displayMessage(data.reply, 'AI Friend'); 70 | }) 71 | .catch(error => console.error('Error:', error)); 72 | } 73 | -------------------------------------------------------------------------------- /Public/chat-history-upload.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Side Menu Navigation 7 | 8 | 21 | 22 | 23 | 29 |

30 |

Upload Chat History

31 | 53 |

54 | 55 | 56 | -------------------------------------------------------------------------------- /clean-chat-history.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import json 3 | import re 4 | from datetime import datetime, timedelta 5 | 6 | def parse_chat_to_json(input_text, custom_user_name): 7 | lines = input_text.split('\n') 8 | messages = [] 9 | 10 | current_time = None 11 | current_messages = [] 12 | has_user = has_assistant = False 13 | 14 | for line in lines: 15 | if "] " in line: 16 | parts = line.split('] ') 17 | timestamp_sender = parts[0][1:] 18 | message_content = '] '.join(parts[1:]) 19 | 20 | if ": " in message_content: 21 | sender, content = message_content.split(': ', 1) 22 | role = "user" if custom_user_name.lower() not in sender.lower() else "assistant" 23 | 24 | content = re.sub(r'[^\w\s]|http\S+', '', content) 25 | 26 | timestamp_format = "%m/%d/%y, %I:%M:%S %p" 27 | try: 28 | timestamp = datetime.strptime(timestamp_sender, timestamp_format) 29 | except ValueError: 30 | print(f"Error parsing timestamp: {timestamp_sender}") 31 | continue 32 | 33 | if current_time is None or timestamp - current_time > timedelta(hours=1): 34 | if has_user and has_assistant: 35 | messages.append({"messages": current_messages}) 36 | current_messages = [] 37 | current_time = timestamp 38 | has_user = has_assistant = False 39 | 40 | current_messages.append({"role": role, "content": content.strip()}) 41 | if role == "user": 42 | has_user = True 43 | else: 44 | has_assistant = True 45 | 46 | if has_user and has_assistant: 47 | messages.append({"messages": current_messages}) 48 | 49 | return messages 50 | 51 | def clean_chat_history(input_file_path, output_file_path, user_name): 52 | with open(input_file_path, 'r', encoding='utf-8') as file: 53 | chat_history = file.read() 54 | 55 | parsed_messages = parse_chat_to_json(chat_history, user_name) 56 | 57 | output_file_path = output_file_path.rstrip('.txt') + '.jsonl' 58 | 59 | with open(output_file_path, 'w', encoding='utf-8') as file: 60 | for message_group in parsed_messages: 61 | json.dump(message_group, file, ensure_ascii=False) 62 | file.write('\n') 63 | 64 | print(f"Formatted chat history saved to {output_file_path}") 65 | 66 | if __name__ == "__main__": 67 | if len(sys.argv) != 4: 68 | print("Usage: python clean_chat_history.py ") 69 | sys.exit(1) 70 | 71 | input_file_path = sys.argv[1] 72 | output_file_path = sys.argv[2] 73 | user_name = sys.argv[3] 74 | clean_chat_history(input_file_path, output_file_path, user_name) 75 | -------------------------------------------------------------------------------- /telegram-clean-chat-history.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import sys 4 | import json 5 | from datetime import datetime, timedelta 6 | 7 | def clean_content(content): 8 | content = re.sub(r'http\S+', '', content) 9 | content = re.sub(r'[^\w\s]', '', content) 10 | return content 11 | 12 | def parse_chat_to_json(input_data, custom_user_name): 13 | messages = input_data['messages'] 14 | grouped_messages = [] 15 | 16 | current_time = None 17 | current_messages = [] 18 | has_user = has_assistant = False 19 | 20 | for message in messages: 21 | if message['type'] == 'message': 22 | timestamp = datetime.fromisoformat(message['date']) 23 | sender = message['from'] 24 | if isinstance(message.get('text'), list): 25 | content = ''.join([str(item) if isinstance(item, str) else item.get('text', '') for item in message['text']]) 26 | else: 27 | content = message.get('text', '') 28 | role = "user" if custom_user_name.lower() not in sender.lower() else "assistant" 29 | 30 | content = clean_content(content) 31 | 32 | if content.strip(): 33 | if current_time is None or timestamp - current_time > timedelta(hours=1): 34 | if has_user and has_assistant: 35 | grouped_messages.append({"messages": current_messages}) 36 | current_messages = [] 37 | current_time = timestamp 38 | has_user = has_assistant = False 39 | 40 | current_messages.append({"role": role, "content": content.strip()}) 41 | if role == "user": 42 | has_user = True 43 | else: 44 | has_assistant = True 45 | 46 | if has_user and has_assistant: 47 | grouped_messages.append({"messages": current_messages}) 48 | 49 | return grouped_messages 50 | 51 | def clean_chat_history(input_file_path, output_file_path, user_name): 52 | with open(input_file_path, 'r', encoding='utf-8') as file: 53 | chat_history = json.load(file) 54 | 55 | parsed_messages = parse_chat_to_json(chat_history, user_name) 56 | 57 | output_file_path = output_file_path.rstrip('.json') + '.jsonl' 58 | 59 | with open(output_file_path, 'w', encoding='utf-8') as file: 60 | for message_group in parsed_messages: 61 | if message_group['messages']: 62 | json.dump(message_group, file, ensure_ascii=False) 63 | file.write('\n') 64 | 65 | if __name__ == "__main__": 66 | if len(sys.argv) != 4: 67 | print("Usage: python clean_chat_history.py ") 68 | sys.exit(1) 69 | 70 | input_file_path = sys.argv[1] 71 | output_file_path = sys.argv[2] 72 | user_name = sys.argv[3] 73 | clean_chat_history(input_file_path, output_file_path, user_name) 74 | -------------------------------------------------------------------------------- /fine-tune.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | from openai import OpenAI 4 | from sklearn.model_selection import train_test_split 5 | 6 | client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) 7 | 8 | cleaned_chat_history_path = "./cleaned_chat_history" 9 | 10 | def combine_and_split_data(path, train_split=0.8): 11 | all_data = [] 12 | for filename in os.listdir(path): 13 | if filename.startswith('.') or filename.endswith('.swp'): 14 | continue 15 | filepath = os.path.join(path, filename) 16 | with open(filepath, 'rb') as f: 17 | for line in f: 18 | try: 19 | decoded_line = line.decode('utf-8', 'ignore') 20 | data = json.loads(decoded_line) 21 | all_data.append(data) 22 | except json.JSONDecodeError as e: 23 | print(f"Error decoding JSON from file {filename}: {e}") 24 | except UnicodeDecodeError as e: 25 | print(f"Unicode decode error in file {filename}: {e}") 26 | 27 | training_data, validation_data = train_test_split( 28 | all_data, test_size=1 - train_split, random_state=42 29 | ) 30 | 31 | train_file = os.path.join(path, "for-training.jsonl") 32 | val_file = os.path.join(path, "for-validation.jsonl") 33 | 34 | with open(train_file, "w", encoding='utf-8') as f: 35 | for item in training_data: 36 | json.dump(item, f) 37 | f.write("\n") 38 | 39 | with open(val_file, "w", encoding='utf-8') as f: 40 | for item in validation_data: 41 | json.dump(item, f) 42 | f.write("\n") 43 | 44 | return train_file, val_file 45 | 46 | def start_fine_tuning(training_file_path, validation_file_path): 47 | """ 48 | Uploads training and validation files and initiates fine-tuning with GPT-3.5 Turbo. 49 | 50 | Args: 51 | training_file_path (str): Path to the training JSONL file. 52 | validation_file_path (str): Path to the validation JSONL file. 53 | """ 54 | 55 | if not os.path.exists(training_file_path) or not os.path.exists(validation_file_path): 56 | print(f"Error: Training or validation file not found at {training_file_path} or {validation_file_path}") 57 | return 58 | 59 | try: 60 | training_file = client.files.create(file=open(training_file_path, "rb"), purpose="fine-tune") 61 | validation_file = client.files.create(file=open(validation_file_path, "rb"), purpose="fine-tune") 62 | 63 | suffix_name = "pocoloco-test" 64 | response = client.fine_tuning.jobs.create( 65 | training_file=training_file.id, 66 | validation_file=validation_file.id, 67 | model="gpt-3.5-turbo", 68 | suffix=suffix_name, 69 | ) 70 | print(f"Fine-tuning job started. Response: {response}") 71 | except Exception as e: 72 | print(f"Error during fine-tuning: {e}") 73 | 74 | 75 | if __name__ == "__main__": 76 | training_file_path, validation_file_path = combine_and_split_data(cleaned_chat_history_path) 77 | start_fine_tuning(training_file_path, validation_file_path) 78 | print("Chat clone generation using GPT-3.5 Turbo initiated.") 79 | -------------------------------------------------------------------------------- /instagram-messenger-clean-chat-history.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import sys 4 | import json 5 | from datetime import datetime, timedelta 6 | 7 | def is_unnecessary_content(content): 8 | unnecessary_patterns = [ 9 | re.compile(r"reaccion\w*\s+.*\s+a tu mensaje", re.IGNORECASE), 10 | re.compile(r"reacted\s+.*\s*to your message", re.IGNORECASE), 11 | re.compile(r"reaccion[oó]\s+[\s\S]*?\s*a tu mensaje", re.IGNORECASE), 12 | ] 13 | return any(pattern.search(content) for pattern in unnecessary_patterns) 14 | 15 | def clean_content(content): 16 | if is_unnecessary_content(content): 17 | return "" 18 | content = re.sub(r'http\S+', '', content) 19 | content = re.sub(r'[^\w\s]', '', content) 20 | return content.strip() 21 | 22 | def parse_chat_to_json(input_data, custom_user_name): 23 | messages = input_data['messages'] 24 | participants = {p['name']: "user" if custom_user_name.lower() not in p['name'].lower() else "assistant" for p in input_data['participants']} 25 | parsed_messages = [] 26 | 27 | current_time = None 28 | current_messages = [] 29 | has_user = has_assistant = False 30 | 31 | for message in reversed(messages): 32 | sender_name = message['sender_name'] 33 | content = message.get('content', '') 34 | timestamp_ms = message['timestamp_ms'] 35 | role = participants.get(sender_name, "user") 36 | 37 | content = clean_content(content) 38 | if not content: 39 | continue 40 | 41 | timestamp = datetime.fromtimestamp(timestamp_ms / 1000.0) 42 | if current_time is None or timestamp - current_time > timedelta(hours=1): 43 | if has_user and has_assistant: 44 | parsed_messages.append({"messages": current_messages}) 45 | current_messages = [] 46 | current_time = timestamp 47 | has_user = has_assistant = False 48 | 49 | current_messages.append({"role": role, "content": content}) 50 | if role == "user": 51 | has_user = True 52 | else: 53 | has_assistant = True 54 | 55 | if has_user and has_assistant: 56 | parsed_messages.append({"messages": current_messages}) 57 | 58 | return parsed_messages 59 | 60 | def clean_chat_history(input_file_path, output_file_path, user_name): 61 | with open(input_file_path, 'r', encoding='utf-8') as file: 62 | chat_history = json.load(file) 63 | 64 | parsed_messages = parse_chat_to_json(chat_history, user_name) 65 | 66 | output_file_path = output_file_path.rstrip('.json') + '.jsonl' 67 | 68 | with open(output_file_path, 'w', encoding='utf-8') as file: 69 | for message_group in parsed_messages: 70 | if message_group['messages']: 71 | json.dump(message_group, file, ensure_ascii=False) 72 | file.write('\n') 73 | 74 | if __name__ == "__main__": 75 | if len(sys.argv) != 4: 76 | print("Usage: python clean_chat_history.py ") 77 | sys.exit(1) 78 | 79 | input_file_path = sys.argv[1] 80 | output_file_path = sys.argv[2] 81 | user_name = sys.argv[3] 82 | clean_chat_history(input_file_path, output_file_path, user_name) 83 | -------------------------------------------------------------------------------- /app.js: -------------------------------------------------------------------------------- 1 | require('dotenv').config(); 2 | const express = require('express'); 3 | const path = require('path'); 4 | const passport = require('passport'); 5 | const session = require('express-session'); 6 | const multer = require('multer'); 7 | const { execFile } = require('child_process'); 8 | const fs = require('fs'); 9 | const http = require('http'); 10 | const { spawn } = require('child_process'); 11 | const { createProxyMiddleware } = require('http-proxy-middleware'); 12 | 13 | const app = express(); 14 | const server = http.createServer(app); 15 | const io = require('socket.io')(server); 16 | 17 | const PORT = process.env.PORT || 3000; 18 | const upload = multer({ dest: 'uploads/' }); 19 | 20 | app.use(express.static(path.join(__dirname, 'public'))); 21 | 22 | passport.serializeUser((user, done) => done(null, user)); 23 | passport.deserializeUser((obj, done) => done(null, obj)); 24 | 25 | app.use(session({ secret: "secret", resave: false, saveUninitialized: true })); 26 | app.use(passport.initialize()); 27 | app.use(passport.session()); 28 | // Proxy endpoint 29 | app.use('/chat', createProxyMiddleware({ 30 | target: 'http://127.0.0.1:5000', 31 | changeOrigin: true, 32 | })); 33 | 34 | app.get('/', (req, res) => res.sendFile(path.join(__dirname, 'public', 'index.html'))); 35 | 36 | const cleanedDirPath = path.join(__dirname, 'cleaned_chat_history'); 37 | if (!fs.existsSync(cleanedDirPath)) fs.mkdirSync(cleanedDirPath); 38 | 39 | function processSingleChatHistory(file, scriptName) { 40 | return new Promise((resolve, reject) => { 41 | const inputFilePath = file.path; 42 | const outputFilePath = path.join(cleanedDirPath, `${file.filename}.json`); 43 | execFile('python3', [scriptName, inputFilePath, outputFilePath], (error, stdout, stderr) => { 44 | if (error) { 45 | console.error(`Error: ${error.message}`); 46 | reject(`Error processing chat history file: ${file.filename}`); 47 | } else { 48 | resolve(stdout); 49 | } 50 | }); 51 | }); 52 | } 53 | 54 | function processChatHistory(req, res, scriptName) { 55 | if (req.files && req.files.length > 0) { 56 | Promise.all(req.files.map(file => processSingleChatHistory(file, scriptName))) 57 | .then(results => { 58 | res.send('All chat history files processed successfully.'); 59 | }) 60 | .catch(error => { 61 | res.status(500).send(error); 62 | }); 63 | } else { 64 | res.status(400).send('No files uploaded.'); 65 | } 66 | } 67 | 68 | function processSingleChatHistory(file, scriptName, userName) { 69 | return new Promise((resolve, reject) => { 70 | const inputFilePath = file.path; 71 | const outputFilePath = path.join(cleanedDirPath, `${file.filename}.json`); 72 | execFile('python3', [scriptName, inputFilePath, outputFilePath, userName], (error, stdout, stderr) => { 73 | if (error) { 74 | console.error(`Error: ${error.message}`); 75 | reject(`Error processing chat history file: ${file.filename}`); 76 | } else { 77 | resolve(stdout); 78 | } 79 | }); 80 | }); 81 | } 82 | 83 | app.post('/upload-whatsapp-history', upload.array('chatHistory', 10), (req, res) => { 84 | const userName = req.body.userName; 85 | Promise.all(req.files.map(file => processSingleChatHistory(file, 'clean-chat-history.py', userName))) 86 | .then(results => { 87 | res.send('

All chat history files processed successfully.

Go Back'); 88 | }) 89 | .catch(error => { 90 | res.status(500).send(error); 91 | }); 92 | }); 93 | 94 | app.post('/upload-telegram-history', upload.array('chatHistory', 10), (req, res) => { 95 | const userName = req.body.userName; 96 | Promise.all(req.files.map(file => processSingleChatHistory(file, 'telegram-clean-chat-history.py', userName))) 97 | .then(results => { 98 | res.send('

All chat history files processed successfully.

Go Back'); 99 | }) 100 | .catch(error => { 101 | res.status(500).send(error); 102 | }); 103 | }); 104 | 105 | app.post('/upload-instagram-messenger-history', upload.array('chatHistory', 10), (req, res) => { 106 | const userName = req.body.userName; 107 | Promise.all(req.files.map(file => processSingleChatHistory(file, 'instagram-messenger-clean-chat-history.py', userName))) 108 | .then(results => { 109 | res.send('

All chat history files processed successfully.

Go Back'); 110 | }) 111 | .catch(error => { 112 | res.status(500).send(error); 113 | }); 114 | }); 115 | 116 | app.post('/upload-chat-history', async (req, res) => { 117 | const cleanedChatContent = cleanChatHistory(req.file); 118 | await storeChatHistory(req.user.id, req.body.chatType, cleanedChatContent); 119 | res.send('

All chat history files processed successfully.

Go Back'); 120 | }); 121 | 122 | app.post('/start-fine-tuning', (req, res) => { 123 | const fineTuneProcess = spawn('python3', ['./fine-tune.py']); 124 | 125 | fineTuneProcess.stdout.on('data', (data) => { 126 | console.log(`stdout: ${data}`); 127 | }); 128 | 129 | fineTuneProcess.stderr.on('data', (data) => { 130 | console.error(`stderr: ${data}`); 131 | }); 132 | 133 | fineTuneProcess.on('close', (code) => { 134 | console.log(`child process exited with code ${code}`); 135 | }); 136 | 137 | res.json({message: "Fine-tuning process started."}); 138 | }); 139 | 140 | 141 | app.get('/chat-history-upload', (req, res) => res.sendFile(path.join(__dirname, 'public', 'chat-history-upload.html'))); 142 | 143 | app.get('/chat-interface', (req, res) => { 144 | res.sendFile(path.join(__dirname, 'public', 'chat-interface.html')); 145 | }); 146 | 147 | app.get('/clone-creation', (req, res) => { 148 | res.sendFile(path.join(__dirname, 'public', 'clone-creation.html')); 149 | }); 150 | 151 | io.on('connection', (socket) => { 152 | console.log('A user connected'); 153 | 154 | socket.on('disconnect', () => { 155 | console.log('User disconnected'); 156 | }); 157 | 158 | socket.on('chat message', (msg) => { 159 | io.emit('chat message', msg); 160 | }); 161 | }); 162 | 163 | server.listen(PORT, () => console.log(`Server running on port ${PORT}`)); 164 | 165 | app.post('/send-message', (req, res) => { 166 | const userMessage = req.body.message; 167 | const aiReply = 'This is a placeholder response from AI'; 168 | res.json({ reply: aiReply }); 169 | }); 170 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Motivation 2 | 3 | So, pretty much everywhere you look, people imagine and build Robot or AI from the productivity perspective. But growing up in Japan until I hit 17, I got a different vibe. Over there, robots in comics and anime are more like your friends or family than just automation tools. I mean, who hasn't wanted a pal like Doraemon? Growing up with these stories, I always dreamed that someday I’d have my own robot friend, someone as tight with me as Nobita is with Doraemon. 4 | 5 | Of course, building a robot that has human like intelegence and emotion is still out there. 6 | But, I figured, why not start small? I wanted to make a chat clone of myself. 7 | Picture it, when you are feeling low and your clone picks you right up, reminding you of the stuff that really matters. 8 | Or think about having someone who gets you on a level that just seems beyond what anyone else can grasp. 9 | 10 | But man, finding a simple way to make this clone was a mission. 11 | GPTs accepts up to only 20 small files to add knoweldges. 12 | AI clone apps like Replika, [c.ai](http://c.ai/), [Delphi.ai](http://delphi.ai/) don’t let you mess with your original data much. 13 | fine-tuning GPT or other LLM requires dataset in a very specific format and this data cleaning task is absolutely nightmare. 14 | 15 | I guess for experienced devs, tweaking a model with your dataset is easy but non-technical ppl like me, it’s just alot. 16 | And I'm not the only one – I've seen loads of folks on Reddit hitting the same wall. 17 | 18 | if i cant find a simple solution, then fuck it i'll build it. 19 | Spent the past 1 month talking to LLMs and built a no-frills web app (for local use) where you can upload your chat history, create a clone, and start chatting away. 20 | Just clone the repo, set it up on your local machine, and you’re good to go. 21 | 22 | It’s still a work in progress. I've got a whole list of things I want to add – like supporting more chat apps, bumping up the fine-tuning quality, and making the chat feel more natural. If you’re interested in improving this app or helping ppl to build relationship with machines, hit me up. @ https://twitter.com/koheingt 23 | 24 | # Key Features 25 | 26 | - Upload chat history from WhatsApp, Telegram, Instagram, and Facebook Messenger 27 | - Clean and reformat chat history for preparing dataset 28 | - Fine-tune a chat model based on users chat history 29 | - Engage in conversations with the your chat clone through a web interface 30 | 31 | # Prerequisites 32 | 33 | Before starting, ensure you have the following: 34 | 35 | - Node.js 36 | - Python 3 37 | - Required npm modules (detailed below) 38 | - Necessary Python packages (specified below) 39 | - OpenAI API key 40 | - Your chat history files 41 | 42 | # Privacy 43 | 44 | Chat history is such a highly personal and confidential thing. That's why I've crafted this app to prioritize your privacy above all. To use it, simply clone the repository and deploy it locally on your computer. This way, your chat history remains securely within your own system, without the risk of exposure. 45 | 46 | However, it's important to note there is one exception. During the model fine-tuning process, your chat history files will be uploaded to your OpenAI Platform account. This step is crucial for personalizing the model to better serve your needs. Rest assured, you have the option to delete these files from your account once fine-tuning is complete. I believe in transparency and want to ensure you're fully informed about how your data is handled. 47 | 48 | # Installation Guide 49 | 50 | This guide is designed to be accessible for everyone especially non-tech ppl. If you're already familiar with the prerequisites, feel free to skip the corresponding steps. This guide is primarily for macOS users, but steps for Windows users may be added in the future. 51 | 52 | # Setting Up Your Environment 53 | 54 | ## Install Homebrew: 55 | 56 | Open Terminal and execute: 57 | 58 | ``` 59 | /bin/bash -c "$(curl -fsSL )" 60 | ``` 61 | 62 | ## Install Git: 63 | 64 | In Terminal, run: 65 | 66 | ``` 67 | brew install git 68 | ``` 69 | 70 | ## Clone the Repository: 71 | 72 | Execute the following command in Terminal: 73 | 74 | ``` 75 | git clone https://github.com/kohhnagata/clone-factory-v1 76 | ``` 77 | 78 | ## Install Node.js and npm: 79 | 80 | Visit Node.js official website and download the LTS version installer. 81 | Follow the installation instructions. 82 | Verify the installation by running the following commands in Terminal: 83 | 84 | ``` 85 | node -v 86 | npm -v 87 | ``` 88 | 89 | ## Install Node.js Dependencies: 90 | 91 | In Terminal, execute: 92 | 93 | ``` 94 | npm install 95 | ``` 96 | 97 | ## Install Required npm Modules: 98 | 99 | Run the following command in Terminal: 100 | 101 | ``` 102 | npm install dotenv express passport express-session multer http-proxy-middleware socket.io 103 | ``` 104 | 105 | ## Install Python: 106 | 107 | Execute in Terminal: 108 | 109 | ``` 110 | brew install python3 111 | ``` 112 | 113 | ## Set Up Python Virtual Environment: 114 | 115 | Locate the clone-factory-v1 directory in Finder and note its path. 116 | In Terminal, replace /path/to/your/repository with your actual path and execute the following commands: 117 | 118 | ``` 119 | cd /path/to/your/clone-factory-v1 120 | python3 -m venv myenv 121 | source myenv/bin/activate 122 | ``` 123 | 124 | ## Install Python Dependencies: 125 | 126 | In Terminal, run: 127 | 128 | ``` 129 | pip install Flask flask-cors openai scikit-learn 130 | ``` 131 | 132 | ## Set Up OPENAI_API_KEY: 133 | 134 | Visit OpenAI Platform, create an account, and navigate to the API keys tab. 135 | Create a new secret key, copy it, and replace your_secrect_key_here in the .env file (ensure there are no spaces around the = sign): 136 | 137 | ``` 138 | OPENAI_API_KEY=your_secret_key_here 139 | ``` 140 | 141 | ## Fund Your OpenAI Account: 142 | 143 | Model fine-tuning requires funding. The cost depends on your data volume, but typically $5-$10 is sufficient. 144 | Navigate to the Billing tab in Settings and add funds as needed. 145 | 146 | # Getting Your Chat History 147 | 148 | ## Telegram 149 | 150 | 1. Make sure you have the desktop version of Telegram installed on your computer. If not, download it from Telegram's official website and log in with your credentials. 151 | 2. Click on the three lines (menu) icon in the top-left corner of the app. From the menu, select "Settings" or "Preferences" (this may vary depending on your operating system). 152 | 3. In the settings menu, look for an "Advanced" option. Click on it to access the advanced settings. 153 | 4. Within the advanced settings, you should find an option to "Export Telegram data" in the buttom. This feature allows you to download your chat history, along with other data like media and files. Click on this option. 154 | 5. Uncheck unnessesary data (ex. Account Information, Contact list, Story archive, Private groups, Photos) and only check “Personal chats”. 155 | 156 | 👉 And most importantly make sure to increase the Size Limit from 8MB to 4000MB and **choose “Machine-readable JSON” as data format**. 157 | 158 | 6. Click on the "Export" button. Telegram will then prepare the data for download. This process can take some time, depending on the amount of data you are exporting. The data is typically saved in a ZIP archive. Unzip this archive to access your chat history. 159 | 160 | ## WhatsApp 161 | 162 | 1. Launch WhatsApp on your smartphone. 163 | 2. Navigate to the chat you wish to download the history of. 164 | 3. Tap on the chat name at the top of the screen or tap the three dots/menu button in the top right corner (this might vary slightly depending on your device) to access chat settings. 165 | 4. Scroll down and select "Export chat" and choose “Without Media”. 166 | 5. You'll be prompted to choose how you want to export the chat. You can email it to yourself, use cloud storage services like Google Drive or Dropbox, or save to files. 167 | 168 | ## Instagram 169 | 170 | 1. Launch the Instagram app or website and log in to your account. 171 | 2. On the mobile app, go to your profile, tap the menu (three lines) in the top right corner, and select "Settings". On the web, click your profile picture in the top right corner, select "Settings", and then click "Your activity". 172 | 3. Click Download or transfer information, check your instagram account and tap Next. 173 | 4. Click Some of your information, check only “Messages” and tap Next. 174 | 5. Check Download to device, and tap Next. 175 | 176 | 👉on the next page, Choose All time as Data range (or customise as you wish), **Choose JSON as data format**, media quality is no-related since you are only downloading chat history. 177 | 6. Submit the request and wait for an email with the download link. 178 | 179 | ## Facebook Messenger 180 | 181 | 1. Log in to your Facebook account and go to "Settings & Privacy" > "Settings" > "Your Facebook Information" > "View" beside "Download Your Information." 182 | 2. Select "Request a download," check only “Messages” and tap Next. 183 | 3. Check Download to device, and tap Next. 184 | 185 | 👉on the next page, Choose All time as Data range (or customise as you wish), **Choose JSON as data format**, media quality is no-related since you are only downloading chat history. 186 | 4. Submit the request and wait for an email with the download link. 187 | 188 | # Starting the Application 189 | 190 | ## Start the Node.js Server: 191 | 192 | Replace /path/to/your/ with your actual path to clone-factory-v1 file and run the following command in Terminal: 193 | 194 | ``` 195 | cd /path/to/your/clone-factory-v1 196 | node app.js 197 | ``` 198 | 199 | ## Start the Flask Server: 200 | 201 | Open “another” Terminal window, replace /path/to/your/ with your actual path, and execute the following commands: 202 | 203 | ``` 204 | cd /path/to/your/clone-factory-v1 205 | export FLASK_APP=chat_with_model.py 206 | flask run 207 | ``` 208 | 209 | ## Open the application: 210 | 211 | Open a web browser and navigate to [http://localhost:3000](http://localhost:3000/). 212 | 213 | # Playing the Application 214 | 215 | ## Upload chat history 216 | 217 | Navigate to the Upload Chat History page within the application. 218 | Follow [the getting-your-chat-history guide](https://www.notion.so/Guide-Book-Clone-Factory-v1-4acbdbb7d116419ca07ae6369be27427?pvs=21) to download your chat history and upload it to the application. Ensure your name matches exactly as it appears in the chat application to facilitate accurate message extraction. 219 | 220 | ## Start fine-tuning a model 221 | 222 | Visit the Create Your Clone page. 223 | Click the Start Fine-Tuning button and monitor the fine-tuning progress at https://platform.openai.com/finetune. The fine-tuning process includes several statuses: Created fine-tuning job, Validating training file, Fine-tuning job started, New fine-tuned model created, and The job has successfully completed. 224 | If the OPENAI Platform's fine-tuning page doesn't display any model training, check your terminal for potential errors. 225 | 226 | ## Chat with your clone 227 | 228 | After the fine-tuning job completes, copy the model ID (not the job ID) from https://platform.openai.com/finetune. The model ID looks like ft:gpt-3.5-turbo-0125:personal:clone-factory-v1:90UcjZ7r. 229 | Navigate to the application's chat interface page and paste the model ID into the Enter your model name section. 230 | Everything is now set up for you to enjoy chatting with your clone! 231 | --------------------------------------------------------------------------------