├── .gitignore ├── assets ├── image_example_1.jpeg └── image_example_2.jpeg ├── data └── kakaotalk_data │ ├── KakaoTalkChats.txt │ └── process_data.py ├── requirements.txt ├── README.md └── main.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore venv folder 2 | venv/ 3 | 4 | # Ignore .env file 5 | .env 6 | 7 | # Ignore db folder 8 | db/ -------------------------------------------------------------------------------- /assets/image_example_1.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sanggubot/doppelganger-gpt/HEAD/assets/image_example_1.jpeg -------------------------------------------------------------------------------- /assets/image_example_2.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sanggubot/doppelganger-gpt/HEAD/assets/image_example_2.jpeg -------------------------------------------------------------------------------- /data/kakaotalk_data/KakaoTalkChats.txt: -------------------------------------------------------------------------------- 1 | others_name 님과 카카오톡 대화 2 | 저장한 날짜 : 2023년 1월 1일 오전 1:01 3 | 4 | 5 | 2023년 1월 1일 오후 1:01 6 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 7 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 8 | 2022년 1월 1일 오후 1:01, others_name : example chat 9 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 10 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 11 | 2022년 1월 1일 오후 1:01, others_name : example chat 12 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 13 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 14 | 2022년 1월 1일 오후 1:01, others_name : example chat 15 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 16 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 17 | 2022년 1월 1일 오후 1:01, others_name : example chat 18 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 19 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 20 | 2022년 1월 1일 오후 1:01, others_name : example chat 21 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 22 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 23 | 2022년 1월 1일 오후 1:01, others_name : example chat 24 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 25 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 26 | 2022년 1월 1일 오후 1:01, others_name : example chat 27 | 2022년 1월 1일 오후 1:01, my_name : this is my example chat 28 | 2022년 1월 1일 오후 1:01, others_name : this is other's splitted 29 | 2022년 1월 1일 오후 1:01, others_name : example chat -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.4 2 | aiosignal==1.3.1 3 | anyio==3.6.2 4 | async-timeout==4.0.2 5 | attrs==22.2.0 6 | backoff==2.2.1 7 | certifi==2022.12.7 8 | charset-normalizer==3.1.0 9 | chromadb==0.3.20 10 | click==8.1.3 11 | clickhouse-connect==0.5.18 12 | dataclasses-json==0.5.7 13 | duckdb==0.7.1 14 | fastapi==0.95.0 15 | filelock==3.10.7 16 | frozenlist==1.3.3 17 | h11==0.14.0 18 | hnswlib==0.7.0 19 | httptools==0.5.0 20 | huggingface-hub==0.13.3 21 | idna==3.4 22 | Jinja2==3.1.2 23 | joblib==1.2.0 24 | langchain==0.0.130 25 | lz4==4.3.2 26 | MarkupSafe==2.1.2 27 | marshmallow==3.19.0 28 | marshmallow-enum==1.5.1 29 | monotonic==1.6 30 | mpmath==1.3.0 31 | multidict==6.0.4 32 | mypy-extensions==1.0.0 33 | networkx==3.0 34 | nltk==3.8.1 35 | numpy==1.24.2 36 | openai==0.27.3 37 | packaging==23.0 38 | pandas==2.0.0 39 | Pillow==9.5.0 40 | posthog==2.4.2 41 | pydantic==1.10.7 42 | python-dateutil==2.8.2 43 | python-dotenv==1.0.0 44 | pytz==2023.3 45 | PyYAML==6.0 46 | regex==2023.3.23 47 | requests==2.28.2 48 | scikit-learn==1.2.2 49 | scipy==1.10.1 50 | sentence-transformers==2.2.2 51 | sentencepiece==0.1.97 52 | six==1.16.0 53 | sniffio==1.3.0 54 | SQLAlchemy==1.4.47 55 | starlette==0.26.1 56 | sympy==1.11.1 57 | tenacity==8.2.2 58 | threadpoolctl==3.1.0 59 | tokenizers==0.13.2 60 | torch==2.0.0 61 | torchvision==0.15.1 62 | tqdm==4.65.0 63 | transformers==4.27.4 64 | typing-inspect==0.8.0 65 | typing_extensions==4.5.0 66 | tzdata==2023.3 67 | urllib3==1.26.15 68 | uvicorn==0.21.1 69 | uvloop==0.17.0 70 | watchfiles==0.19.0 71 | websockets==11.0 72 | yarl==1.8.2 73 | zstandard==0.20.0 74 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DoppelgangerGPT 2 | 3 |    4 | 5 | This GitHub repository uses OpenAI API, vector search, and langchain to create a personalized digital doppelganger that mimics your language and communication style. Doppelganger provides an AI-based chatbot experience that reflects the user's personality based on KakaoTalk chat data. 6 | 7 | ## Installation 8 | 9 | To install the dependencies, run the following command: 10 | 11 | ``` 12 | pip install -r requirements.txt 13 | ``` 14 | 15 | ## Environment Variables 16 | 17 | Create a .env file in the root folder and add the following line: 18 | 19 | ``` 20 | OPENAI_API_KEY="YOUR_OPENAI_API_KEY" 21 | ``` 22 | 23 | Make sure to replace YOUR_OPENAI_API_KEY with your actual OpenAI API key. 24 | 25 | ## Dataset Setup 26 | 27 | Export your KakaoTalk chat data and save it as KakaoTalkChats.txt. Then, move the file to the data/kakaotalk_data/ folder. 28 | 29 | ## Usage 30 | 31 | To process the data, run the following commands: 32 | 33 | ``` 34 | cd data/kakaotalk_data 35 | python process_data.py 36 | ``` 37 | 38 | This will create a /db folder in the root directory. 39 | 40 | Next, run the following command to start the chatbot: 41 | 42 | ``` 43 | python main.py 44 | ``` 45 | 46 | ## Examples 47 | 48 | Examples of previous version (The names of the people talking were set to "상대방" and "나") 49 | 50 |
51 |
52 |
53 |