├── .gitignore
├── README.md
├── ai_to_whatsapp.py
├── chat.py
├── chat_process.py
├── colab.ipynb
├── img
    ├── file_download.png
    └── file_upload.png
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | .pybuilder/
 76 | target/
 77 | 
 78 | # Jupyter Notebook
 79 | .ipynb_checkpoints
 80 | 
 81 | # IPython
 82 | profile_default/
 83 | ipython_config.py
 84 | 
 85 | # pyenv
 86 | #   For a library or package, you might want to ignore these files since the code is
 87 | #   intended to run in multiple environments; otherwise, check them in:
 88 | # .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # poetry
 98 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 99 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
100 | #   commonly ignored for libraries.
101 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 | 
104 | # pdm
105 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106 | #pdm.lock
107 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108 | #   in version control.
109 | #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110 | .pdm.toml
111 | .pdm-python
112 | .pdm-build/
113 | 
114 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
115 | __pypackages__/
116 | 
117 | # Celery stuff
118 | celerybeat-schedule
119 | celerybeat.pid
120 | 
121 | # SageMath parsed files
122 | *.sage.py
123 | 
124 | # Environments
125 | .env
126 | .venv
127 | env/
128 | venv/
129 | ENV/
130 | env.bak/
131 | venv.bak/
132 | 
133 | # Spyder project settings
134 | .spyderproject
135 | .spyproject
136 | 
137 | # Rope project settings
138 | .ropeproject
139 | 
140 | # mkdocs documentation
141 | /site
142 | 
143 | # mypy
144 | .mypy_cache/
145 | .dmypy.json
146 | dmypy.json
147 | 
148 | # Pyre type checker
149 | .pyre/
150 | 
151 | # pytype static type analyzer
152 | .pytype/
153 | 
154 | # Cython debug symbols
155 | cython_debug/
156 | 
157 | # PyCharm
158 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
160 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
161 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
162 | #.idea/
163 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Make-AI-Clone-of-Yourself
 2 | 
 3 | ### Full Video Tutorial on [Youtube](https://youtu.be/a2_ZvzE55cA) - <https://youtu.be/a2_ZvzE55cA>
 4 | 
 5 | ## Motivation
 6 | 
 7 | I saw a reel on Instagram in which an AI enthusiast created an AI clone of himself to talk to his girlfriend (Certainly, I won't do that... xd) using [RAG](https://youtu.be/YVWxbHJakgg?feature=shared) Retrieval Augmented Generation and the Chat GPT-3.5 turbo API. It kind of worked, but it had major privacy issues. Sending personal chats to Chat GPT could potentially result in those chats being used by OpenAI to train its model. This led me to think, what if I fine-tuned a pre-existing model like Llama or Mixtral on my personal WhatsApp chat history? It would be cool to have a model that can talk like me as I do on WhatsApp, primarily in Hinglish (Hindi + English).
 8 | 
 9 | [Fine-tuning a large language model (LLM)](https://youtu.be/YVWxbHJakgg?feature=shared) requires a good understanding of LLMs in general. The major challenge with fine-tuning big models, such as a 7B parameter model, is that it requires a minimum of 32GB of GPU RAM, which is costly and not available in free-tier GPU compute services like Colab or Kaggle. So, I had to find a way around this limitation.
10 | 
11 | Another important aspect is that the fine-tuning results heavily depend on the quality and size of the dataset used. Converting raw WhatsApp chat data into a usable dataset is challenging but worth pursuing.
12 | 
13 | Let's see how it looks in reality and how it's being carried out.
14 | 
15 | ## Google Colab Notebook
16 | 
17 | Here is link of [Google Colab Notebook](https://colab.research.google.com/drive/1OGkiAZsYfShY0o8ZphCUuXkmb2Om422X?usp=sharing)
18 | 
19 | * This Notebook Contains - 
20 |     * Data Collection From Whatsapp
21 |     * Data Prepration
22 |     * Data Filtering
23 |     * Model Training
24 |     * Inference
25 |     * Saving Finetuned Model
26 |     * GGUF Conversion
27 |     * Downloading Saved Model
28 | 
29 | 
30 | ## Chatting With Fine-Tuned Model Using Ollama (You can also use [LM Studio](https://lmstudio.ai/) & [GPT4ALL](https://www.nomic.ai/gpt4all))
31 | 
32 | ### Downloading Ollama
33 | Go to the [downloads page of Ollama](https://ollama.com/download) and download and install it according to your OS.
34 | 
35 | ### Loading the Model into Ollama
36 | 1. Open your file manager.
37 | 2. Navigate to the directory where you have downloaded the fine-tuned model, generally the Downloads folder.
38 | 3. Right-click anywhere on the screen and choose "Open terminal here." If you are using Windows, you can directly type `cmd` into the address bar and hit enter.
39 | 4. Type `ollama --version` into the terminal to check if you have installed Ollama successfully.
40 | 5. Make sure the model `unsloth.Q8_0.gguf` and `Modelfile` are in the current directory.
41 | 6. Open `Modelfile` using any text editor.
42 | 7. Edit the first line to ensure it looks like `FROM ./unsloth.Q8_0.gguf`, and save it.
43 | 8. Type this command: `ollama create my_model -f Modelfile`. This will add the model into Ollama.
44 | 9. Now type `ollama run my_model`.
45 | 10. You can now chat with your model.
46 | 
47 | ## Using Model to Automate WhatsApp
48 | Here comes the final part. I am using this wonderful tool [WPP_Whatsapp](https://github.com/3mora2/WPP_Whatsapp) to automate WhatsApp. By using this, we can use our model to respond to any incoming messages on WhatsApp. We can define specific people to talk to. Here is the step-by-step guide:
49 | 
50 | 1. Clone this repo using:
51 |     `git clone https://github.com/Eviltr0N/Make-AI-Clone-of-Yourself.git`
52 | 
53 | 2. Go to the cloned repo:
54 |     `cd Make-AI-Clone-of-Yourself/`
55 | 
56 | 3. Install all the required packages by:
57 |     `pip install -r requirements.txt`
58 | 
59 | 4. First run:
60 |     `python3 chat.py`
61 | 
62 |    Then type something like "Hello" and hit enter. If it works, it means everything is set up correctly.
63 | 
64 | 5. Exit by pressing `Ctrl+C`.
65 | 6. Now run:
66 |     `python3 ai_to_whatsapp.py`
67 | 
68 | 7. It will take a bit of time at first to download the Chromium browser. As it finishes, a browser window will appear, and you have to scan the QR code using WhatsApp to link your WhatsApp account.
69 | 8. Switch back to the terminal. You have to provide the phone number of the other person you want to respond to with this AI model.
70 | 9. The phone number must include the country code without the `+` symbol, such as `916969696969`, then press enter.
71 | 10. As soon as that person sends any message, it will be printed on the terminal, and the AI model will respond to it.
72 | 
73 | ### Keep in Mind
74 | * If you want to change the temperature and top_k of the model (in simpler terms, temperature means creativity of the model), then:
75 |     1. Open the `ai_to_whatsapp.py` file using any text editor.
76 |     2. Go to the 9th line where you will find:
77 |        `my_llm = LLM("my_model", 0.3, 50, 128)`
78 | 
79 |     3. Change `0.3` to a higher value such as `0.9` to increase the temperature (creativity) of the model.
80 |     4. You can also change top_k from `50` to a higher value like `95`.
81 | 
82 | * Temperature can vary from `0.1` to `1` and top_k from `1` to `100`. The higher the temperature, the more creative and unpredictable the model becomes.
83 | * Keep playing with the values of `temperature` and `top_k` until you are satisfied with the model's responses.
84 | 
85 | 
86 | ## What's Next
87 | * Adding multimodal capabilities so this can understand and react to images too, as it currently supports only text messages.
88 | 
89 | * Adding some sort of agent pipeline above this model so another model, such as `LLaMA 3.1`, can see the message and the response of this model and judge if it is suitable or not, and can modify the model response accordingly.
90 | 


--------------------------------------------------------------------------------
/ai_to_whatsapp.py:
--------------------------------------------------------------------------------
 1 | from WPP_Whatsapp import Create
 2 | from langchain_core.messages import HumanMessage, AIMessage
 3 | import logging
 4 | import csv
 5 | from chat import LLM
 6 | 
 7 | #========LLM(Model_name, temperature, top_k, max_tokens)
 8 | 
 9 | my_llm = LLM("my_model", 0.3, 50, 128)
10 | 
11 | # logger = logging.getLogger(name="WPP_Whatsapp")
12 | # logger.setLevel(logging.DEBUG)
13 | 
14 | 
15 | your_session_name = "test"
16 | creator = Create(session=your_session_name)
17 | 
18 | 
19 | client = creator.start()
20 | 
21 | 
22 | if creator.state != 'CONNECTED':
23 |     raise Exception(creator.state)
24 | 
25 | 
26 | def save_chat(ph_number, msg):
27 |     with open(f'{ph_number}.csv', "a") as f:
28 |         csv_writer = csv.DictWriter(f, fieldnames=["USER", "AI"])
29 |         csv_writer.writerow(msg)
30 | 
31 | def new_message(message):
32 |     global client, gender
33 | 
34 |     if message and not message.get("isGroupMsg"):
35 |         chat_id = message.get("from")
36 |         message_id = message.get("id")
37 |         if chat_id==f'{ph_number}@c.us':
38 |             print("Sender: ", message.get("body"))
39 |             client.sendSeen(chat_id)
40 |             res = my_llm.chat(message.get("body"))
41 |             print("AI: ", res)
42 |             client.startTyping(chat_id)
43 |             res_list = res.split("\n ")
44 |             client.reply(chat_id, res_list[0], message_id)
45 |             client.stopTyping(chat_id)
46 |             for msg in range(1, len(res_list)):
47 |                 client.startTyping(chat_id)
48 |                 client.sendText(chat_id, res_list[msg])
49 |             client.stopTyping(chat_id)
50 |             save_chat(ph_number, {"USER": message.get("body"), "AI": res})
51 | 
52 | creator.client.ThreadsafeBrowser.page_evaluate_sync("""
53 |  // Resolvenndo bug 'TypeError: i.Wid.isStatusV3 is not a function'
54 |     if(!WPP.whatsapp.Wid.isStatusV3) {
55 |       WPP.whatsapp.Wid.isStatusV3 = () => false
56 |     }
57 | """)
58 | 
59 | 
60 | 
61 | ph_number = input("Enter Ph. No. with Country code: ")
62 | while len(ph_number) < 12:
63 |     ph_number = input("WITH Countary Code: ")
64 | 
65 | 
66 | 
67 | creator.client.onMessage(new_message)
68 | creator.loop.run_forever()
69 | 


--------------------------------------------------------------------------------
/chat.py:
--------------------------------------------------------------------------------
 1 | from langchain_community.chat_models import ChatOllama
 2 | from langchain_core.output_parsers import StrOutputParser
 3 | from langchain_core.messages import HumanMessage, AIMessage
 4 | from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 5 | class LLM:
 6 |     def __init__(self, model='llama3', temp=0.3, top_k=40, max_tokens=128):
 7 |         self.llm =  ChatOllama(
 8 |             model = model,
 9 |             keep_alive=-1,
10 |             temperature=temp,
11 |             num_predict = max_tokens,
12 |             top_k = top_k,
13 |             # top_p = 0.9
14 |         )
15 |         self.chat_history = []
16 |         self.max_chat_history = 20 + 20
17 |         self.prompt_template = ChatPromptTemplate.from_messages(
18 |             [
19 |                 MessagesPlaceholder(variable_name="chat_history"),
20 |                 ("human", "{input}"),
21 |             ]
22 |         )
23 |         self.chain = self.prompt_template | self.llm | StrOutputParser()
24 |     def chat(self, prompt):
25 |         self.res = self.chain.invoke({"input": prompt, "chat_history": self.chat_history})
26 |         self.chat_history.append(HumanMessage(content=prompt))
27 |         self.chat_history.append(AIMessage(content=self.res))
28 |         if len(self.chat_history) > self.max_chat_history:
29 |             self.chat_history.pop(0)
30 |             self.chat_history.pop(1)
31 |         # print(len(self.chat_history))
32 |         return self.res
33 | def testing():
34 |     ml = LLM("my_model", 0.4, 40, 128)
35 |     while True:
36 |         question = input("You: ")
37 |         if question == "done":
38 |             return
39 |         print(ml.chat(question))
40 | if __name__ == "__main__":
41 |     testing()


--------------------------------------------------------------------------------
/chat_process.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import os
  3 | import shutil
  4 | import csv
  5 | 
  6 | chat_dir = "chats"
  7 | 
  8 | filler_words = ["ok", "hn", "Okay"]
  9 | class Wh_Chat_Processor:
 10 |     def __init__(self):
 11 |         pass
 12 |     def open_chat_file(self, dir,filename):
 13 |         self.sender_name = filename.replace("WhatsApp Chat with ", "").replace(".txt", "")
 14 |         with open(os.path.join(dir,filename)) as f:
 15 |             chat_text = f.read()
 16 |         return chat_text
 17 |     
 18 |     def msg_filter_basic(self, chat_text):
 19 |         filtered = []
 20 |         pt = r' - ([^:]+): (.*?)(?=\d{1,2}/\d{1,2}/\d{2,4}, \d{1,2}:\d{2}\s*(?:AM|PM|am|pm)? - |$)'
 21 |         msgs = re.findall(pt, chat_text, re.DOTALL)
 22 |         for msg in msgs:
 23 |             line = msg[1]
 24 |             wh_default_filter = "Tap to learn more." in line or "<Media omitted>" in line
 25 |             website_filter = "https://" in line or "http://" in line
 26 |             mail_filter = "@gmail.com" in line
 27 |             deleted_msg_filter = "This message was deleted" in line or "You deleted this message" in line or "<This message was edited>" in line or "(file attached)" in line
 28 | 
 29 |             if not (wh_default_filter or website_filter or mail_filter or deleted_msg_filter):
 30 |                     filtered.append(msg)
 31 |         return filtered
 32 | 
 33 |     def process_chat(self, chat_data):
 34 |         merged_lines = []
 35 |         current_sender = None
 36 |         current_message = {}
 37 |         for line in chat_data:
 38 |             if not line:
 39 |                 continue
 40 |             parts = line
 41 |             if len(parts) == 2:
 42 |                 sender, message = parts
 43 |                 if current_sender is None:
 44 |                     current_sender = sender
 45 |                     current_message[current_sender] = [message.strip()]
 46 |                 elif sender == current_sender:
 47 |                     current_message[current_sender].append(message.strip())
 48 |                 else:
 49 |                     merged_lines.append(current_message)
 50 |                     current_sender = sender
 51 |                     current_message = {current_sender: [message.strip()]}
 52 |             else:
 53 |                 if current_sender:
 54 |                     current_message[current_sender][-1] += " " + line.strip()
 55 |         if current_sender:
 56 |             merged_lines.append(current_message)
 57 |         keys = set() 
 58 |         for line in merged_lines:
 59 |             # print(line)
 60 |             for key in line.keys():
 61 |                 if key != self.sender_name:
 62 |                     keys.add(key)
 63 |         self.my_name = list(keys)[0]
 64 |         print(list(keys))
 65 |         return merged_lines
 66 | 
 67 |     def advance_filter(self, merged_chat_data):
 68 |         filtered_data=[]
 69 |         sender = ""
 70 |         me = ""
 71 |         chk = 1
 72 |         CD = merged_chat_data
 73 |         for ind, x in enumerate(CD):
 74 |             if x.get(self.sender_name) != None :
 75 |                 if len(x[self.sender_name]) == 1 and ( x[self.sender_name][0] in filler_words or len(x[self.sender_name][0]) ==1 ):
 76 |                     continue      
 77 |                 if len(CD[ind][self.sender_name]) > 1:
 78 |                     for y in range(0,len(CD[ind][self.sender_name])):
 79 |                         if y+1 != len(CD[ind][self.sender_name]):
 80 |                             sender += CD[ind][self.sender_name][y] + "\n"
 81 |                         else:
 82 |                             sender += CD[ind][self.sender_name][y]
 83 |                 else:
 84 |                     sender += CD[ind][self.sender_name][0]
 85 |             elif x.get(self.my_name) != None and len(sender) > 1:
 86 |                 if len(CD[ind][self.my_name]) > 1:
 87 |                     for y in range(0,len(CD[ind][self.my_name])):
 88 |                         if y+1 != len(CD[ind][self.my_name]):
 89 |                             me += CD[ind][self.my_name][y] + "\n"
 90 |                         else:
 91 |                             me += CD[ind][self.my_name][y]
 92 |                 else:
 93 |                     me += CD[ind][self.my_name][0]
 94 |             else:
 95 |                 continue
 96 |             if chk ==1:
 97 |                 chk+=1
 98 |             elif chk ==2:
 99 |                 filtered_data.append([sender, me])
100 |                 sender = ""
101 |                 me=""
102 |                 chk=1
103 |             else:
104 |                 pass
105 |         return filtered_data
106 | 
107 | with open("all_chat_data.csv", "w") as f:
108 |     f.write("Prompt,Response"+ "\n")
109 | 
110 | for file in os.listdir(os.path.join(chat_dir)):
111 |     if file.endswith('.zip'):
112 |         full_path = os.path.join(chat_dir, file)
113 |         shutil.unpack_archive(full_path, chat_dir)
114 | 
115 | for file in os.listdir(os.path.join(chat_dir)):
116 |     processor = Wh_Chat_Processor()
117 |     if file.endswith('.txt'):
118 |         print("Processing: ",file)
119 |         chat_d = processor.open_chat_file(chat_dir,file)
120 |         basic_f = processor.msg_filter_basic(chat_d)
121 |         chat_ps = processor.process_chat(basic_f)
122 |         filtered_data = processor.advance_filter(chat_ps)
123 |         with open("all_chat_data.csv", "a") as f:
124 |             csv_writer = csv.writer(f)
125 |             for row in filtered_data:
126 |                 csv_writer.writerow(row)


--------------------------------------------------------------------------------
/colab.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "cells": [
   3 |     {
   4 |       "cell_type": "code",
   5 |       "execution_count": null,
   6 |       "metadata": {
   7 |         "id": "CgxobX5hwzbh"
   8 |       },
   9 |       "outputs": [],
  10 |       "source": [
  11 |         "# Installs Unsloth, Xformers (Flash Attention) and all other packages\n",
  12 |         "!pip install \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\n",
  13 |         "!pip install --no-deps \"xformers<0.0.27\" \"trl<0.9.0\" peft accelerate bitsandbytes"
  14 |       ]
  15 |     },
  16 |     {
  17 |       "cell_type": "markdown",
  18 |       "metadata": {
  19 |         "id": "8VLYYB1Bwzbi"
  20 |       },
  21 |       "source": [
  22 |         "### Background\n",
  23 |         "\n",
  24 |         "I saw a reel on Instagram in which an AI enthusiast created an AI clone of himself to talk to his girlfriend (Certainly, I won't do that... xd) using [RAG](https://youtu.be/YVWxbHJakgg?feature=shared) Retrieval Augmented Generation and the Chat GPT-3.5 turbo API. It kind of worked, but it had major privacy issues. Sending personal chats to Chat GPT could potentially result in those chats being used by OpenAI to train its model. This led me to think, what if I fine-tuned a pre-existing model like Llama or Mixtral on my personal WhatsApp chat history? It would be cool to have a model that can talk like me as I do on WhatsApp, primarily in Hinglish (Hindi + English).\n",
  25 |         "\n",
  26 |         "[Fine-tuning a large language model (LLM)](https://youtu.be/YVWxbHJakgg?feature=shared) requires a good understanding of LLMs in general. The major challenge with fine-tuning big models, such as a 7B parameter model, is that it requires a minimum of 32GB of GPU RAM, which is costly and not available in free-tier GPU compute services like Colab or Kaggle. So, I had to find a way around this limitation.\n",
  27 |         "\n",
  28 |         "Another important aspect is that the fine-tuning results heavily depend on the quality and size of the dataset used. Converting raw WhatsApp chat data into a usable dataset is challenging but worth pursuing.\n",
  29 |         "\n",
  30 |         "Let's see how it looks in reality and how it's being carried out."
  31 |       ]
  32 |     },
  33 |     {
  34 |       "cell_type": "markdown",
  35 |       "metadata": {
  36 |         "id": "IBoJ_Xw7wzbl"
  37 |       },
  38 |       "source": [
  39 |         "### Important\n",
  40 |         "\n",
  41 |         "We are using a free instance of Google Colab to fine-tune our model`(Llama3)`, making it **totally free**.\n",
  42 |         "\n",
  43 |         "For chatting with our fine-tuned model, we will use [Ollama](https://ollama.com/) locally, which is very lightweight and requires only **8GB** of free RAM in your laptop/PC and works without any **GPU** support.\n",
  44 |         "\n",
  45 |         "**Keep in mind that your chat data is completely safe; it is not being sent to anyone.**"
  46 |       ]
  47 |     },
  48 |     {
  49 |       "cell_type": "markdown",
  50 |       "metadata": {
  51 |         "id": "vITh0KVJ10qX"
  52 |       },
  53 |       "source": [
  54 |         "\n",
  55 |         "### Data Prep\n",
  56 |         "To extract chat history from your WhatsApp chats, follow these steps:\n",
  57 |         "\n",
  58 |         "1. Open your WhatsApp application.\n",
  59 |         "2. Go to the chat from which you want to extract the chat history.\n",
  60 |         "3. Click on the three dots in the top right corner of the screen.\n",
  61 |         "4. Click on `More` then click on `Export Chat`.\n",
  62 |         "5. Select `Without media`.\n",
  63 |         "6. Save it locally or send it to saved messages on Telegram so you can later download it on your Telegram desktop.\n",
  64 |         "7. Repeat these steps for all of your chats. The more chat data you have, the better the results will be.\n",
  65 |         "\n",
  66 |         "It will generate `.zip file`. You don't have to extract it.\n",
  67 |         "\n",
  68 |         "\n"
  69 |       ]
  70 |     },
  71 |     {
  72 |       "cell_type": "markdown",
  73 |       "metadata": {
  74 |         "id": "TqvPGRZDwzbl"
  75 |       },
  76 |       "source": [
  77 |         "#### Upload Exported Chat files to Colab runtime\n",
  78 |         "Now, locate your exported chat zip files and upload them to the Colab runtime. Follow these steps to upload files to Google Colab:\n",
  79 |         "\n",
  80 |         "1. Click on the Files icon on the left side of the screen (as shown in the image attached below).\n",
  81 |         "2. Click on the upload button. It will open the File Explorer. Choose the exported chat zip files (you can select multiple files at once).\n",
  82 |         "    * Wait until your files are uploaded. The upload process bar will display at the bottom left corner of the screen.\n",
  83 |         "    * Once files are uploaded successfully, they will appear in the Files tab of Google Colab.\n",
  84 |         "<img src=\"https://github.com/Eviltr0N/Make-AI-Clone-of-Yourself/raw/main/img/file_download.png\">\n",
  85 |         "\n",
  86 |         "##### Keep in Mind:\n",
  87 |         "\n",
  88 |         "* Export chat history only for meaningful conversations. Before exporting a chat, consider whether it adds value to the data or if it is just a short conversation.\n",
  89 |         "* If you think you don’t want the AI to learn from a specific chat, then don’t export it.\n",
  90 |         "* Currently, it only supports individual chats, so please do not export group chats.\n"
  91 |       ]
  92 |     },
  93 |     {
  94 |       "cell_type": "markdown",
  95 |       "metadata": {
  96 |         "id": "O5JN21BSwzbm"
  97 |       },
  98 |       "source": [
  99 |         "### Data Filtering\n",
 100 |         "\n",
 101 |         "The exported data contains many irregularities such as `<Media omitted>`, `This message was deleted` and timestamps of messages. We need to remove these and convert the whole chat history into a `Prompt: Response` format so it can be used to fine-tune the model. To extract messages from the data, I used `regex`. Additionally, I filtered out any links and emails from the chat data for obvious privacy reasons.\n",
 102 |         "\n",
 103 |         "**Now, please edit the below list of `filler_words`**. These words may vary from person to person. Some examples are `Ok`, `Yup`, `Hmm`, `Han`. We need to remove these from the dataset because if we don't, the fine-tuned model will primarily learn these words and, in most cases, respond with them. For example, if you ask, \"Where are you going?\" the model might respond with something like \"Ok\" or \"Hmm.\""
 104 |       ]
 105 |     },
 106 |     {
 107 |       "cell_type": "markdown",
 108 |       "metadata": {
 109 |         "id": "44cyft5Jwzbm"
 110 |       },
 111 |       "source": [
 112 |         "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!"
 113 |       ]
 114 |     },
 115 |     {
 116 |       "cell_type": "code",
 117 |       "execution_count": 2,
 118 |       "metadata": {
 119 |         "id": "zPGexEG7wzbm"
 120 |       },
 121 |       "outputs": [],
 122 |       "source": [
 123 |         "filler_words = [\"Ok\", \"Okay\", \"Yup\", \"Hmm\"]\n",
 124 |         "# Add or remove words from this list based on your personal usage.\n",
 125 |         "\n",
 126 |         "chat_dir = \"./\""
 127 |       ]
 128 |     },
 129 |     {
 130 |       "cell_type": "code",
 131 |       "execution_count": 3,
 132 |       "metadata": {
 133 |         "id": "BdgsnIpGwzbn"
 134 |       },
 135 |       "outputs": [],
 136 |       "source": [
 137 |         "import re\n",
 138 |         "import os\n",
 139 |         "import shutil\n",
 140 |         "import csv"
 141 |       ]
 142 |     },
 143 |     {
 144 |       "cell_type": "code",
 145 |       "execution_count": 4,
 146 |       "metadata": {
 147 |         "id": "-PvTvhPVwzbn"
 148 |       },
 149 |       "outputs": [],
 150 |       "source": [
 151 |         "class Wh_Chat_Processor:\n",
 152 |         "    def __init__(self):\n",
 153 |         "        pass\n",
 154 |         "    def open_chat_file(self, dir,filename):\n",
 155 |         "        self.sender_name = filename.replace(\"WhatsApp Chat with \", \"\").replace(\".txt\", \"\")\n",
 156 |         "        with open(os.path.join(dir,filename)) as f:\n",
 157 |         "            chat_text = f.read()\n",
 158 |         "        return chat_text\n",
 159 |         "\n",
 160 |         "    def msg_filter_basic(self, chat_text):\n",
 161 |         "        filtered = []\n",
 162 |         "        pt = r' - ([^:]+): (.*?)(?=\\d{1,2}/\\d{1,2}/\\d{2,4}, \\d{1,2}:\\d{2}\\s*(?:AM|PM|am|pm)? - |$)'\n",
 163 |         "        msgs = re.findall(pt, chat_text, re.DOTALL)\n",
 164 |         "        for msg in msgs:\n",
 165 |         "            line = msg[1]\n",
 166 |         "            wh_default_filter = \"Tap to learn more.\" in line or \"<Media omitted>\" in line\n",
 167 |         "            website_filter = \"https://\" in line or \"http://\" in line\n",
 168 |         "            mail_filter = \"@gmail.com\" in line\n",
 169 |         "            deleted_msg_filter = \"This message was deleted\" in line or \"You deleted this message\" in line or \"<This message was edited>\" in line or \"(file attached)\" in line\n",
 170 |         "\n",
 171 |         "            if not (wh_default_filter or website_filter or mail_filter or deleted_msg_filter):\n",
 172 |         "                    filtered.append(msg)\n",
 173 |         "        return filtered\n",
 174 |         "\n",
 175 |         "    def process_chat(self, chat_data):\n",
 176 |         "        merged_lines = []\n",
 177 |         "        current_sender = None\n",
 178 |         "        current_message = {}\n",
 179 |         "        for line in chat_data:\n",
 180 |         "            if not line:\n",
 181 |         "                continue\n",
 182 |         "            parts = line\n",
 183 |         "            if len(parts) == 2:\n",
 184 |         "                sender, message = parts\n",
 185 |         "                if current_sender is None:\n",
 186 |         "                    current_sender = sender\n",
 187 |         "                    current_message[current_sender] = [message.strip()]\n",
 188 |         "                elif sender == current_sender:\n",
 189 |         "                    current_message[current_sender].append(message.strip())\n",
 190 |         "                else:\n",
 191 |         "                    merged_lines.append(current_message)\n",
 192 |         "                    current_sender = sender\n",
 193 |         "                    current_message = {current_sender: [message.strip()]}\n",
 194 |         "            else:\n",
 195 |         "                if current_sender:\n",
 196 |         "                    current_message[current_sender][-1] += \" \" + line.strip()\n",
 197 |         "        if current_sender:\n",
 198 |         "            merged_lines.append(current_message)\n",
 199 |         "        keys = set()\n",
 200 |         "        for line in merged_lines:\n",
 201 |         "            # print(line)\n",
 202 |         "            for key in line.keys():\n",
 203 |         "                if key != self.sender_name:\n",
 204 |         "                    keys.add(key)\n",
 205 |         "        self.my_name = list(keys)[0]\n",
 206 |         "        print(list(keys))\n",
 207 |         "        return merged_lines\n",
 208 |         "\n",
 209 |         "    def advance_filter(self, merged_chat_data):\n",
 210 |         "        filtered_data=[]\n",
 211 |         "        sender = \"\"\n",
 212 |         "        me = \"\"\n",
 213 |         "        chk = 1\n",
 214 |         "        CD = merged_chat_data\n",
 215 |         "        for ind, x in enumerate(CD):\n",
 216 |         "            if x.get(self.sender_name) != None :\n",
 217 |         "                if len(x[self.sender_name]) == 1 and ( x[self.sender_name][0] in filler_words or len(x[self.sender_name][0]) ==1 ):\n",
 218 |         "                    continue\n",
 219 |         "                if len(CD[ind][self.sender_name]) > 1:\n",
 220 |         "                    for y in range(0,len(CD[ind][self.sender_name])):\n",
 221 |         "                        if y+1 != len(CD[ind][self.sender_name]):\n",
 222 |         "                            sender += CD[ind][self.sender_name][y] + \"\\n\"\n",
 223 |         "                        else:\n",
 224 |         "                            sender += CD[ind][self.sender_name][y]\n",
 225 |         "                else:\n",
 226 |         "                    sender += CD[ind][self.sender_name][0]\n",
 227 |         "            elif x.get(self.my_name) != None and len(sender) > 1:\n",
 228 |         "                if len(CD[ind][self.my_name]) > 1:\n",
 229 |         "                    for y in range(0,len(CD[ind][self.my_name])):\n",
 230 |         "                        if y+1 != len(CD[ind][self.my_name]):\n",
 231 |         "                            me += CD[ind][self.my_name][y] + \"\\n\"\n",
 232 |         "                        else:\n",
 233 |         "                            me += CD[ind][self.my_name][y]\n",
 234 |         "                else:\n",
 235 |         "                    me += CD[ind][self.my_name][0]\n",
 236 |         "            else:\n",
 237 |         "                continue\n",
 238 |         "            if chk ==1:\n",
 239 |         "                chk+=1\n",
 240 |         "            elif chk ==2:\n",
 241 |         "                filtered_data.append([sender, me])\n",
 242 |         "                sender = \"\"\n",
 243 |         "                me=\"\"\n",
 244 |         "                chk=1\n",
 245 |         "            else:\n",
 246 |         "                pass\n",
 247 |         "        return filtered_data\n"
 248 |       ]
 249 |     },
 250 |     {
 251 |       "cell_type": "code",
 252 |       "execution_count": 32,
 253 |       "metadata": {
 254 |         "id": "0DVwbFW1wzbo"
 255 |       },
 256 |       "outputs": [],
 257 |       "source": [
 258 |         "with open(\"all_chat_data.csv\", \"w\") as f:\n",
 259 |         "    f.write(\"Prompt,Response\"+ \"\\n\")\n",
 260 |         "\n",
 261 |         "for file in os.listdir(os.path.join(chat_dir)):\n",
 262 |         "    if file.endswith('.zip'):\n",
 263 |         "        full_path = os.path.join(chat_dir, file)\n",
 264 |         "        shutil.unpack_archive(full_path, chat_dir)"
 265 |       ]
 266 |     },
 267 |     {
 268 |       "cell_type": "code",
 269 |       "execution_count": null,
 270 |       "metadata": {
 271 |         "id": "5f7ieid2wzbo"
 272 |       },
 273 |       "outputs": [],
 274 |       "source": [
 275 |         "for file in os.listdir(os.path.join(chat_dir)):\n",
 276 |         "    processor = Wh_Chat_Processor()\n",
 277 |         "    if file.endswith('.txt'):\n",
 278 |         "        print(\"Processing: \",file)\n",
 279 |         "        chat_d = processor.open_chat_file(chat_dir,file)\n",
 280 |         "        basic_f = processor.msg_filter_basic(chat_d)\n",
 281 |         "        chat_ps = processor.process_chat(basic_f)\n",
 282 |         "        filtered_data = processor.advance_filter(chat_ps)\n",
 283 |         "        with open(\"all_chat_data.csv\", \"a\") as f:\n",
 284 |         "            csv_writer = csv.writer(f)\n",
 285 |         "            for row in filtered_data:\n",
 286 |         "                csv_writer.writerow(row)\n",
 287 |         "print(\"Successfully Processed all the chats... Generated CSV File of chats is saved in Current directory with the name 'all_chat_data.csv'\")"
 288 |       ]
 289 |     },
 290 |     {
 291 |       "cell_type": "markdown",
 292 |       "metadata": {
 293 |         "id": "c7FUCDIrwzbo"
 294 |       },
 295 |       "source": [
 296 |         "### Model Fine-Tuning\n",
 297 |         "As we discussed earlier, fine-tuning a 7B parameter model with just 16GB of RAM is not possible. To achieve this, we will use a technique known as [Quantization](https://huggingface.co/docs/optimum/en/concept_guides/quantization). Specifically, we will use 4-bit quantization.\n",
 298 |         "\n",
 299 |         "I am using [Unsloth](https://github.com/unslothai/unsloth) for the rest of the processes, such as quantization and training the model. Unsloth has very good documentation and requires less VRAM to fine-tune the model."
 300 |       ]
 301 |     },
 302 |     {
 303 |       "cell_type": "markdown",
 304 |       "metadata": {
 305 |         "id": "IqM-T1RTzY6C"
 306 |       },
 307 |       "source": [
 308 |         "Check out - [Unsloth's Github](https://github.com/unslothai/unsloth)\n",
 309 |         "\n",
 310 |         "This notebook uses the `Llama-3` format for conversation style finetunes. We use [Open Assistant conversations](https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style) in ShareGPT style."
 311 |       ]
 312 |     },
 313 |     {
 314 |       "cell_type": "markdown",
 315 |       "metadata": {
 316 |         "id": "r2v_X2fA0Df5"
 317 |       },
 318 |       "source": [
 319 |         "* Unsloth support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc\n",
 320 |         "* Unsloth support 16bit LoRA or 4bit QLoRA. Both 2x faster.\n"
 321 |       ]
 322 |     },
 323 |     {
 324 |       "cell_type": "code",
 325 |       "execution_count": 6,
 326 |       "metadata": {
 327 |         "id": "-jDsHAJLwzbp"
 328 |       },
 329 |       "outputs": [],
 330 |       "source": []
 331 |     },
 332 |     {
 333 |       "cell_type": "markdown",
 334 |       "metadata": {
 335 |         "id": "iSvaRuHpwzbp"
 336 |       },
 337 |       "source": [
 338 |         "For finetuning I am using **`Llama3` 8B Instruct** as our base model, you can use other models such as `Mixtral` and `Gemma`. I have traied Mixtral also but it dosent perform as good as `Llama3`."
 339 |       ]
 340 |     },
 341 |     {
 342 |       "cell_type": "code",
 343 |       "execution_count": null,
 344 |       "metadata": {
 345 |         "id": "QmUBVEnvCDJv"
 346 |       },
 347 |       "outputs": [],
 348 |       "source": [
 349 |         "from unsloth import FastLanguageModel\n",
 350 |         "import torch\n",
 351 |         "max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!\n",
 352 |         "dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+\n",
 353 |         "load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.\n",
 354 |         "\n",
 355 |         "# 4bit pre quantized models we support for 4x faster downloading + no OOMs.\n",
 356 |         "fourbit_models = [\n",
 357 |         "    \"unsloth/mistral-7b-v0.3-bnb-4bit\",      # New Mistral v3 2x faster!\n",
 358 |         "    \"unsloth/mistral-7b-instruct-v0.3-bnb-4bit\",\n",
 359 |         "    \"unsloth/llama-3-8b-bnb-4bit\",           # Llama-3 15 trillion tokens model 2x faster!\n",
 360 |         "    \"unsloth/llama-3-8b-Instruct-bnb-4bit\",\n",
 361 |         "    \"unsloth/llama-3-70b-bnb-4bit\",\n",
 362 |         "    \"unsloth/Phi-3-mini-4k-instruct\",        # Phi-3 2x faster!\n",
 363 |         "    \"unsloth/Phi-3-medium-4k-instruct\",\n",
 364 |         "    \"unsloth/mistral-7b-bnb-4bit\",\n",
 365 |         "    \"unsloth/gemma-7b-bnb-4bit\",             # Gemma 2.2x faster!\n",
 366 |         "] # More models at https://huggingface.co/unsloth\n",
 367 |         "\n",
 368 |         "model, tokenizer = FastLanguageModel.from_pretrained(\n",
 369 |         "    model_name = \"unsloth/llama-3-8b-Instruct-bnb-4bit\", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B\n",
 370 |         "    max_seq_length = max_seq_length,\n",
 371 |         "    dtype = dtype,\n",
 372 |         "    load_in_4bit = load_in_4bit,\n",
 373 |         "    # token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n",
 374 |         ")"
 375 |       ]
 376 |     },
 377 |     {
 378 |       "cell_type": "markdown",
 379 |       "metadata": {
 380 |         "id": "SXd9bTZd1aaL"
 381 |       },
 382 |       "source": [
 383 |         "We now add LoRA adapters so we only need to update 1 to 10% of all parameters!"
 384 |       ]
 385 |     },
 386 |     {
 387 |       "cell_type": "code",
 388 |       "execution_count": null,
 389 |       "metadata": {
 390 |         "id": "6bZsfBuZDeCL"
 391 |       },
 392 |       "outputs": [],
 393 |       "source": [
 394 |         "model = FastLanguageModel.get_peft_model(\n",
 395 |         "    model,\n",
 396 |         "    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n",
 397 |         "    target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
 398 |         "                      \"gate_proj\", \"up_proj\", \"down_proj\",],\n",
 399 |         "    lora_alpha = 16,\n",
 400 |         "    lora_dropout = 0, # Supports any, but = 0 is optimized\n",
 401 |         "    bias = \"none\",    # Supports any, but = \"none\" is optimized\n",
 402 |         "    # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n",
 403 |         "    use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n",
 404 |         "    random_state = 3407,\n",
 405 |         "    use_rslora = False,  # We support rank stabilized LoRA\n",
 406 |         "    loftq_config = None, # And LoftQ\n",
 407 |         ")"
 408 |       ]
 409 |     },
 410 |     {
 411 |       "cell_type": "markdown",
 412 |       "metadata": {
 413 |         "id": "ZaOf7N7twzbq"
 414 |       },
 415 |       "source": [
 416 |         "Let's prepare dataset from the filtered Whatsapp Chat data"
 417 |       ]
 418 |     },
 419 |     {
 420 |       "cell_type": "code",
 421 |       "execution_count": 25,
 422 |       "metadata": {
 423 |         "id": "1AhDjFQSRq6v"
 424 |       },
 425 |       "outputs": [],
 426 |       "source": [
 427 |         "import pandas as pd\n",
 428 |         "from datasets import Dataset, load_dataset\n",
 429 |         "from unsloth.chat_templates import get_chat_template"
 430 |       ]
 431 |     },
 432 |     {
 433 |       "cell_type": "code",
 434 |       "execution_count": 26,
 435 |       "metadata": {
 436 |         "id": "oSS0Znn4wzbq"
 437 |       },
 438 |       "outputs": [],
 439 |       "source": [
 440 |         "tokenizer = get_chat_template(\n",
 441 |         "    tokenizer,\n",
 442 |         "    chat_template=\"llama-3\",  # Use the desired chat template\n",
 443 |         "    mapping={\"role\": \"from\", \"content\": \"value\", \"user\": \"human\", \"assistant\": \"gpt\"}\n",
 444 |         ")\n",
 445 |         "\n",
 446 |         "# Define the formatting function\n",
 447 |         "def formatting_prompts_func(examples):\n",
 448 |         "    convos = examples[\"conversations\"]\n",
 449 |         "    texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]\n",
 450 |         "    return {\"text\": texts}\n"
 451 |       ]
 452 |     },
 453 |     {
 454 |       "cell_type": "code",
 455 |       "execution_count": 40,
 456 |       "metadata": {
 457 |         "colab": {
 458 |           "base_uri": "https://localhost:8080/",
 459 |           "height": 49,
 460 |           "referenced_widgets": [
 461 |             "1d777ec6fd994cd0a10b8a5ee5a9ecce",
 462 |             "fb53fae9bb8f43668e58817861a99830",
 463 |             "39eb48a59bbc43459dd403c5361e7277",
 464 |             "8f43afa148464357ac7084cfc82b54f5",
 465 |             "e442477c71da43c3a55b494f01c9adf7",
 466 |             "ea483ec87d2145dc8d94eaaa3767c826",
 467 |             "e8ef5c1da74f4603845780dcc02f64da",
 468 |             "1d018226491848a593d2fc0f983528f0",
 469 |             "accf1601f86d4ff785f39262f82c527a",
 470 |             "f8a76f16cc6243ee9f4063b45cf4ad59",
 471 |             "eb776dfe78db49a4802f03b41ad2e15e"
 472 |           ]
 473 |         },
 474 |         "id": "wFP3QWaViVi8",
 475 |         "outputId": "c8f520b9-7e75-4744-add4-18bf1dbdf4bc"
 476 |       },
 477 |       "outputs": [
 478 |         {
 479 |           "data": {
 480 |             "application/vnd.jupyter.widget-view+json": {
 481 |               "model_id": "1d777ec6fd994cd0a10b8a5ee5a9ecce",
 482 |               "version_major": 2,
 483 |               "version_minor": 0
 484 |             },
 485 |             "text/plain": [
 486 |               "Map:   0%|          | 0/2157 [00:00<?, ? examples/s]"
 487 |             ]
 488 |           },
 489 |           "metadata": {},
 490 |           "output_type": "display_data"
 491 |         }
 492 |       ],
 493 |       "source": [
 494 |         "df = pd.read_csv(\"all_chat_data.csv\")\n",
 495 |         "conversations = []\n",
 496 |         "for _, row in df.iterrows():\n",
 497 |         "    try:\n",
 498 |         "        conversation = [\n",
 499 |         "            {'from': 'human', 'value': str(row['Prompt'])},\n",
 500 |         "            {'from': 'assistant', 'value': str(row['Response'])}\n",
 501 |         "        ]\n",
 502 |         "        conversations.append(conversation)\n",
 503 |         "    except:\n",
 504 |         "        print(_ , row)\n",
 505 |         "\n",
 506 |         "\n",
 507 |         "dataset = Dataset.from_dict({\"conversations\": conversations})\n",
 508 |         "dataset = dataset.map(formatting_prompts_func, batched=True)"
 509 |       ]
 510 |     },
 511 |     {
 512 |       "cell_type": "markdown",
 513 |       "metadata": {
 514 |         "id": "EdsJiFbRwzbq"
 515 |       },
 516 |       "source": [
 517 |         "Let's see how the `Llama-3` format works by printing the 5th element"
 518 |       ]
 519 |     },
 520 |     {
 521 |       "cell_type": "code",
 522 |       "execution_count": null,
 523 |       "metadata": {
 524 |         "id": "yuCMQuonRq0n"
 525 |       },
 526 |       "outputs": [],
 527 |       "source": [
 528 |         "dataset[5][\"conversations\"]"
 529 |       ]
 530 |     },
 531 |     {
 532 |       "cell_type": "code",
 533 |       "execution_count": null,
 534 |       "metadata": {
 535 |         "id": "oJQ66lSywzbr"
 536 |       },
 537 |       "outputs": [],
 538 |       "source": [
 539 |         "print(dataset[5][\"text\"])"
 540 |       ]
 541 |     },
 542 |     {
 543 |       "cell_type": "markdown",
 544 |       "metadata": {
 545 |         "id": "idAEIeSQ3xdS"
 546 |       },
 547 |       "source": [
 548 |         "\n",
 549 |         "### Train the model\n",
 550 |         "Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer).\n",
 551 |         "* I am doing `1 epochs` to speed things up, but you can set `num_train_epochs` to 2 or 3, Just experiment with it.\n",
 552 |         "* If you have large dataset then just go for `1 full epoch`. Do not do more than 3 or 4 epoch if the `training loss` is not **decreasing**."
 553 |       ]
 554 |     },
 555 |     {
 556 |       "cell_type": "code",
 557 |       "execution_count": null,
 558 |       "metadata": {
 559 |         "id": "95_Nn-89DhsL"
 560 |       },
 561 |       "outputs": [],
 562 |       "source": [
 563 |         "from trl import SFTTrainer\n",
 564 |         "from transformers import TrainingArguments\n",
 565 |         "from unsloth import is_bfloat16_supported\n",
 566 |         "\n",
 567 |         "trainer = SFTTrainer(\n",
 568 |         "    model = model,\n",
 569 |         "    tokenizer = tokenizer,\n",
 570 |         "    train_dataset = dataset,\n",
 571 |         "    dataset_text_field = \"text\",\n",
 572 |         "    max_seq_length = max_seq_length,\n",
 573 |         "    dataset_num_proc = 2,\n",
 574 |         "    packing = False, # Can make training 5x faster for short sequences.\n",
 575 |         "    args = TrainingArguments(\n",
 576 |         "        per_device_train_batch_size = 2,\n",
 577 |         "        gradient_accumulation_steps = 4,\n",
 578 |         "        warmup_steps = 5,\n",
 579 |         "        num_train_epochs=1,\n",
 580 |         "        learning_rate = 2e-4,\n",
 581 |         "        fp16 = not is_bfloat16_supported(),\n",
 582 |         "        bf16 = is_bfloat16_supported(),\n",
 583 |         "        logging_steps = 1,\n",
 584 |         "        optim = \"adamw_8bit\",\n",
 585 |         "        weight_decay = 0.01,\n",
 586 |         "        lr_scheduler_type = \"linear\",\n",
 587 |         "        seed = 3407,\n",
 588 |         "        output_dir = \"outputs\",\n",
 589 |         "    ),\n",
 590 |         ")"
 591 |       ]
 592 |     },
 593 |     {
 594 |       "cell_type": "code",
 595 |       "execution_count": null,
 596 |       "metadata": {
 597 |         "id": "yqxqAZ7KJ4oL"
 598 |       },
 599 |       "outputs": [],
 600 |       "source": [
 601 |         "trainer_stats = trainer.train()"
 602 |       ]
 603 |     },
 604 |     {
 605 |       "cell_type": "markdown",
 606 |       "metadata": {
 607 |         "id": "ekOmTR1hSNcr"
 608 |       },
 609 |       "source": [
 610 |         "<a name=\"Inference\"></a>\n",
 611 |         "### Inference\n",
 612 |         "Let's run the model! Since we're using `Llama-3`, use `apply_chat_template` with `add_generation_prompt` set to `True` for inference."
 613 |       ]
 614 |     },
 615 |     {
 616 |       "cell_type": "code",
 617 |       "execution_count": 50,
 618 |       "metadata": {
 619 |         "colab": {
 620 |           "base_uri": "https://localhost:8080/"
 621 |         },
 622 |         "id": "kR3gIAX-SM2q",
 623 |         "outputId": "00c0beb8-e397-46f8-ac82-12b91e1eb31d"
 624 |       },
 625 |       "outputs": [
 626 |         {
 627 |           "name": "stdout",
 628 |           "output_type": "stream",
 629 |           "text": [
 630 |             "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n",
 631 |             "\n",
 632 |             "Pagal ho gya hai kya<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
 633 |             "\n",
 634 |             "Ha<|eot_id|>\n"
 635 |           ]
 636 |         }
 637 |       ],
 638 |       "source": [
 639 |         "from unsloth.chat_templates import get_chat_template\n",
 640 |         "from transformers import TextStreamer\n",
 641 |         "\n",
 642 |         "tokenizer = get_chat_template(\n",
 643 |         "    tokenizer,\n",
 644 |         "    chat_template = \"llama-3\", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth\n",
 645 |         "    mapping = {\"role\" : \"from\", \"content\" : \"value\", \"user\" : \"human\", \"assistant\" : \"gpt\"}, # ShareGPT style\n",
 646 |         ")\n",
 647 |         "text_streamer = TextStreamer(tokenizer)\n",
 648 |         "FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n",
 649 |         "\n",
 650 |         "messages = [\n",
 651 |         "    {\"from\": \"human\", \"value\": \"Pagal ho gya hai kya\"},\n",
 652 |         "]\n",
 653 |         "inputs = tokenizer.apply_chat_template(\n",
 654 |         "    messages,\n",
 655 |         "    tokenize = True,\n",
 656 |         "    add_generation_prompt = True, # Must add for generation\n",
 657 |         "    return_tensors = \"pt\",\n",
 658 |         ").to(\"cuda\")\n",
 659 |         "\n",
 660 |         "output = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128, use_cache = True)"
 661 |       ]
 662 |     },
 663 |     {
 664 |       "cell_type": "markdown",
 665 |       "metadata": {
 666 |         "id": "CrSvZObor0lY"
 667 |       },
 668 |       "source": [
 669 |         " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!"
 670 |       ]
 671 |     },
 672 |     {
 673 |       "cell_type": "markdown",
 674 |       "metadata": {
 675 |         "id": "uMuVrWbjAzhc"
 676 |       },
 677 |       "source": [
 678 |         "<a name=\"Save\"></a>\n",
 679 |         "### Saving finetuned model (Lora Adapters Only)\n",
 680 |         "To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.\n",
 681 |         "\n",
 682 |         "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!"
 683 |       ]
 684 |     },
 685 |     {
 686 |       "cell_type": "code",
 687 |       "execution_count": 51,
 688 |       "metadata": {
 689 |         "id": "upcOlWe7A1vc"
 690 |       },
 691 |       "outputs": [],
 692 |       "source": [
 693 |         "model.save_pretrained(\"lora_model\") # Local saving\n"
 694 |       ]
 695 |     },
 696 |     {
 697 |       "cell_type": "markdown",
 698 |       "metadata": {
 699 |         "id": "TCv4vXHd61i7"
 700 |       },
 701 |       "source": [
 702 |         "### GGUF Conversion (For Ollama)\n",
 703 |         "**To use our finetuned model in our PC/laptop we will use [Ollama](https://ollama.com/)**. To use this model with  `Ollama` We have to save the model in `GGUF` Format.\n",
 704 |         "\n",
 705 |         "Unsloth allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n",
 706 |         "\n",
 707 |         "Some supported quant methods (full list on Unsloth [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n",
 708 |         "    * `q8_0` - Fast conversion. High resource use, but generally acceptable.\n",
 709 |         "    * `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n",
 710 |         "    * `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K"
 711 |       ]
 712 |     },
 713 |     {
 714 |       "cell_type": "code",
 715 |       "execution_count": null,
 716 |       "metadata": {
 717 |         "id": "FqfebeAdT073"
 718 |       },
 719 |       "outputs": [],
 720 |       "source": [
 721 |         "# Save to 8bit Q8_0\n",
 722 |         "if True: model.save_pretrained_gguf(\"model\", tokenizer,)\n",
 723 |         "if False: model.push_to_hub_gguf(\"hf/model\", tokenizer, token = \"\")"
 724 |       ]
 725 |     },
 726 |     {
 727 |       "cell_type": "markdown",
 728 |       "metadata": {
 729 |         "id": "bDp0zNpwe6U_"
 730 |       },
 731 |       "source": [
 732 |         "Once the fine-tuning is complete, the model `unsloth.Q8_0.gguf` is saved in the `models/` folder. You need to download this `GGUF` file and `Modelfile` too. You can directly download it by locating the file in the files section of Google Colab, or you can copy this file to your Google Drive to download it from there. If your internet connection is slow like mine, the Google Drive method is best because the file is large (approximately 8GB). Here are the steps for both methods."
 733 |       ]
 734 |     },
 735 |     {
 736 |       "cell_type": "markdown",
 737 |       "metadata": {
 738 |         "id": "E6GXCfCxwzbv"
 739 |       },
 740 |       "source": [
 741 |         "### Direct Download via Colab\n",
 742 |         "1. Click on the files section in Google Colab.\n",
 743 |         "2. Locate the models folder. Then expand it by clicking on the arrow located to the left of the folder name.\n",
 744 |         "3. Choose the file `unsloth.Q8_0.gguf` & `Modelfile`, then hover the mouse cursor over the filename.\n",
 745 |         "4. Click on the three dots, then select Download.\n",
 746 |         "\n"
 747 |       ]
 748 |     },
 749 |     {
 750 |       "cell_type": "markdown",
 751 |       "metadata": {},
 752 |       "source": [
 753 |         "![Download_IMG](https://github.com/Eviltr0N/Make-AI-Clone-of-Yourself/raw/main/img/file_download.png)"
 754 |       ]
 755 |     },
 756 |     {
 757 |       "cell_type": "markdown",
 758 |       "metadata": {
 759 |         "id": "w2T3YwrQwzbv"
 760 |       },
 761 |       "source": [
 762 |         "### Download via Google Drive\n",
 763 |         "* Before using this method, make sure you have 8GB of free space left in your Google Drive. Otherwise, this will not work.\n",
 764 |         "1. Run the below cell to mount your drive with your Colab account."
 765 |       ]
 766 |     },
 767 |     {
 768 |       "cell_type": "code",
 769 |       "execution_count": null,
 770 |       "metadata": {
 771 |         "id": "-zZMNG20wzbv"
 772 |       },
 773 |       "outputs": [],
 774 |       "source": [
 775 |         "from google.colab import drive\n",
 776 |         "drive.mount('/content/drive')"
 777 |       ]
 778 |     },
 779 |     {
 780 |       "cell_type": "markdown",
 781 |       "metadata": {
 782 |         "id": "VQ09Nt60wzbv"
 783 |       },
 784 |       "source": [
 785 |         "* Run this cell to copy the model into your Google Drive."
 786 |       ]
 787 |     },
 788 |     {
 789 |       "cell_type": "code",
 790 |       "execution_count": null,
 791 |       "metadata": {
 792 |         "id": "YVSdA_1Ywzbv"
 793 |       },
 794 |       "outputs": [],
 795 |       "source": [
 796 |         "!mkdir /content/drive/MyDrive/finetuned_model\n",
 797 |         "!cp /content/model/unsloth.Q8_0.gguf /content/drive/MyDrive/finetuned_model/\n",
 798 |         "!cp /content/model/Modelfile /content/drive/MyDrive/finetuned_model/"
 799 |       ]
 800 |     },
 801 |     {
 802 |       "cell_type": "markdown",
 803 |       "metadata": {
 804 |         "id": "CZAEERK4Tj1C"
 805 |       },
 806 |       "source": [
 807 |         "### Using Finetuned Model With Ollama and Whatsapp\n",
 808 |         "Now follow this guide on my [Github](https://github.com/Eviltr0N/Make-AI-Clone-of-Yourself?tab=readme-ov-file#loading-the-model-into-ollama) to chat with Your finetuned model.\n",
 809 |         "[Here](https://github.com/Eviltr0N/Make-AI-Clone-of-Yourself?tab=readme-ov-file#loading-the-model-into-ollama)  "
 810 |       ]
 811 |     }
 812 |   ],
 813 |   "metadata": {
 814 |     "accelerator": "GPU",
 815 |     "colab": {
 816 |       "gpuType": "T4",
 817 |       "provenance": []
 818 |     },
 819 |     "kernelspec": {
 820 |       "display_name": "Python 3",
 821 |       "name": "python3"
 822 |     },
 823 |     "language_info": {
 824 |       "name": "python"
 825 |     },
 826 |     "widgets": {
 827 |       "application/vnd.jupyter.widget-state+json": {
 828 |         "1d018226491848a593d2fc0f983528f0": {
 829 |           "model_module": "@jupyter-widgets/base",
 830 |           "model_module_version": "1.2.0",
 831 |           "model_name": "LayoutModel",
 832 |           "state": {
 833 |             "_model_module": "@jupyter-widgets/base",
 834 |             "_model_module_version": "1.2.0",
 835 |             "_model_name": "LayoutModel",
 836 |             "_view_count": null,
 837 |             "_view_module": "@jupyter-widgets/base",
 838 |             "_view_module_version": "1.2.0",
 839 |             "_view_name": "LayoutView",
 840 |             "align_content": null,
 841 |             "align_items": null,
 842 |             "align_self": null,
 843 |             "border": null,
 844 |             "bottom": null,
 845 |             "display": null,
 846 |             "flex": null,
 847 |             "flex_flow": null,
 848 |             "grid_area": null,
 849 |             "grid_auto_columns": null,
 850 |             "grid_auto_flow": null,
 851 |             "grid_auto_rows": null,
 852 |             "grid_column": null,
 853 |             "grid_gap": null,
 854 |             "grid_row": null,
 855 |             "grid_template_areas": null,
 856 |             "grid_template_columns": null,
 857 |             "grid_template_rows": null,
 858 |             "height": null,
 859 |             "justify_content": null,
 860 |             "justify_items": null,
 861 |             "left": null,
 862 |             "margin": null,
 863 |             "max_height": null,
 864 |             "max_width": null,
 865 |             "min_height": null,
 866 |             "min_width": null,
 867 |             "object_fit": null,
 868 |             "object_position": null,
 869 |             "order": null,
 870 |             "overflow": null,
 871 |             "overflow_x": null,
 872 |             "overflow_y": null,
 873 |             "padding": null,
 874 |             "right": null,
 875 |             "top": null,
 876 |             "visibility": null,
 877 |             "width": null
 878 |           }
 879 |         },
 880 |         "1d777ec6fd994cd0a10b8a5ee5a9ecce": {
 881 |           "model_module": "@jupyter-widgets/controls",
 882 |           "model_module_version": "1.5.0",
 883 |           "model_name": "HBoxModel",
 884 |           "state": {
 885 |             "_dom_classes": [],
 886 |             "_model_module": "@jupyter-widgets/controls",
 887 |             "_model_module_version": "1.5.0",
 888 |             "_model_name": "HBoxModel",
 889 |             "_view_count": null,
 890 |             "_view_module": "@jupyter-widgets/controls",
 891 |             "_view_module_version": "1.5.0",
 892 |             "_view_name": "HBoxView",
 893 |             "box_style": "",
 894 |             "children": [
 895 |               "IPY_MODEL_fb53fae9bb8f43668e58817861a99830",
 896 |               "IPY_MODEL_39eb48a59bbc43459dd403c5361e7277",
 897 |               "IPY_MODEL_8f43afa148464357ac7084cfc82b54f5"
 898 |             ],
 899 |             "layout": "IPY_MODEL_e442477c71da43c3a55b494f01c9adf7"
 900 |           }
 901 |         },
 902 |         "39eb48a59bbc43459dd403c5361e7277": {
 903 |           "model_module": "@jupyter-widgets/controls",
 904 |           "model_module_version": "1.5.0",
 905 |           "model_name": "FloatProgressModel",
 906 |           "state": {
 907 |             "_dom_classes": [],
 908 |             "_model_module": "@jupyter-widgets/controls",
 909 |             "_model_module_version": "1.5.0",
 910 |             "_model_name": "FloatProgressModel",
 911 |             "_view_count": null,
 912 |             "_view_module": "@jupyter-widgets/controls",
 913 |             "_view_module_version": "1.5.0",
 914 |             "_view_name": "ProgressView",
 915 |             "bar_style": "success",
 916 |             "description": "",
 917 |             "description_tooltip": null,
 918 |             "layout": "IPY_MODEL_1d018226491848a593d2fc0f983528f0",
 919 |             "max": 2157,
 920 |             "min": 0,
 921 |             "orientation": "horizontal",
 922 |             "style": "IPY_MODEL_accf1601f86d4ff785f39262f82c527a",
 923 |             "value": 2157
 924 |           }
 925 |         },
 926 |         "8f43afa148464357ac7084cfc82b54f5": {
 927 |           "model_module": "@jupyter-widgets/controls",
 928 |           "model_module_version": "1.5.0",
 929 |           "model_name": "HTMLModel",
 930 |           "state": {
 931 |             "_dom_classes": [],
 932 |             "_model_module": "@jupyter-widgets/controls",
 933 |             "_model_module_version": "1.5.0",
 934 |             "_model_name": "HTMLModel",
 935 |             "_view_count": null,
 936 |             "_view_module": "@jupyter-widgets/controls",
 937 |             "_view_module_version": "1.5.0",
 938 |             "_view_name": "HTMLView",
 939 |             "description": "",
 940 |             "description_tooltip": null,
 941 |             "layout": "IPY_MODEL_f8a76f16cc6243ee9f4063b45cf4ad59",
 942 |             "placeholder": "​",
 943 |             "style": "IPY_MODEL_eb776dfe78db49a4802f03b41ad2e15e",
 944 |             "value": " 2157/2157 [00:00&lt;00:00, 9866.30 examples/s]"
 945 |           }
 946 |         },
 947 |         "accf1601f86d4ff785f39262f82c527a": {
 948 |           "model_module": "@jupyter-widgets/controls",
 949 |           "model_module_version": "1.5.0",
 950 |           "model_name": "ProgressStyleModel",
 951 |           "state": {
 952 |             "_model_module": "@jupyter-widgets/controls",
 953 |             "_model_module_version": "1.5.0",
 954 |             "_model_name": "ProgressStyleModel",
 955 |             "_view_count": null,
 956 |             "_view_module": "@jupyter-widgets/base",
 957 |             "_view_module_version": "1.2.0",
 958 |             "_view_name": "StyleView",
 959 |             "bar_color": null,
 960 |             "description_width": ""
 961 |           }
 962 |         },
 963 |         "e442477c71da43c3a55b494f01c9adf7": {
 964 |           "model_module": "@jupyter-widgets/base",
 965 |           "model_module_version": "1.2.0",
 966 |           "model_name": "LayoutModel",
 967 |           "state": {
 968 |             "_model_module": "@jupyter-widgets/base",
 969 |             "_model_module_version": "1.2.0",
 970 |             "_model_name": "LayoutModel",
 971 |             "_view_count": null,
 972 |             "_view_module": "@jupyter-widgets/base",
 973 |             "_view_module_version": "1.2.0",
 974 |             "_view_name": "LayoutView",
 975 |             "align_content": null,
 976 |             "align_items": null,
 977 |             "align_self": null,
 978 |             "border": null,
 979 |             "bottom": null,
 980 |             "display": null,
 981 |             "flex": null,
 982 |             "flex_flow": null,
 983 |             "grid_area": null,
 984 |             "grid_auto_columns": null,
 985 |             "grid_auto_flow": null,
 986 |             "grid_auto_rows": null,
 987 |             "grid_column": null,
 988 |             "grid_gap": null,
 989 |             "grid_row": null,
 990 |             "grid_template_areas": null,
 991 |             "grid_template_columns": null,
 992 |             "grid_template_rows": null,
 993 |             "height": null,
 994 |             "justify_content": null,
 995 |             "justify_items": null,
 996 |             "left": null,
 997 |             "margin": null,
 998 |             "max_height": null,
 999 |             "max_width": null,
1000 |             "min_height": null,
1001 |             "min_width": null,
1002 |             "object_fit": null,
1003 |             "object_position": null,
1004 |             "order": null,
1005 |             "overflow": null,
1006 |             "overflow_x": null,
1007 |             "overflow_y": null,
1008 |             "padding": null,
1009 |             "right": null,
1010 |             "top": null,
1011 |             "visibility": null,
1012 |             "width": null
1013 |           }
1014 |         },
1015 |         "e8ef5c1da74f4603845780dcc02f64da": {
1016 |           "model_module": "@jupyter-widgets/controls",
1017 |           "model_module_version": "1.5.0",
1018 |           "model_name": "DescriptionStyleModel",
1019 |           "state": {
1020 |             "_model_module": "@jupyter-widgets/controls",
1021 |             "_model_module_version": "1.5.0",
1022 |             "_model_name": "DescriptionStyleModel",
1023 |             "_view_count": null,
1024 |             "_view_module": "@jupyter-widgets/base",
1025 |             "_view_module_version": "1.2.0",
1026 |             "_view_name": "StyleView",
1027 |             "description_width": ""
1028 |           }
1029 |         },
1030 |         "ea483ec87d2145dc8d94eaaa3767c826": {
1031 |           "model_module": "@jupyter-widgets/base",
1032 |           "model_module_version": "1.2.0",
1033 |           "model_name": "LayoutModel",
1034 |           "state": {
1035 |             "_model_module": "@jupyter-widgets/base",
1036 |             "_model_module_version": "1.2.0",
1037 |             "_model_name": "LayoutModel",
1038 |             "_view_count": null,
1039 |             "_view_module": "@jupyter-widgets/base",
1040 |             "_view_module_version": "1.2.0",
1041 |             "_view_name": "LayoutView",
1042 |             "align_content": null,
1043 |             "align_items": null,
1044 |             "align_self": null,
1045 |             "border": null,
1046 |             "bottom": null,
1047 |             "display": null,
1048 |             "flex": null,
1049 |             "flex_flow": null,
1050 |             "grid_area": null,
1051 |             "grid_auto_columns": null,
1052 |             "grid_auto_flow": null,
1053 |             "grid_auto_rows": null,
1054 |             "grid_column": null,
1055 |             "grid_gap": null,
1056 |             "grid_row": null,
1057 |             "grid_template_areas": null,
1058 |             "grid_template_columns": null,
1059 |             "grid_template_rows": null,
1060 |             "height": null,
1061 |             "justify_content": null,
1062 |             "justify_items": null,
1063 |             "left": null,
1064 |             "margin": null,
1065 |             "max_height": null,
1066 |             "max_width": null,
1067 |             "min_height": null,
1068 |             "min_width": null,
1069 |             "object_fit": null,
1070 |             "object_position": null,
1071 |             "order": null,
1072 |             "overflow": null,
1073 |             "overflow_x": null,
1074 |             "overflow_y": null,
1075 |             "padding": null,
1076 |             "right": null,
1077 |             "top": null,
1078 |             "visibility": null,
1079 |             "width": null
1080 |           }
1081 |         },
1082 |         "eb776dfe78db49a4802f03b41ad2e15e": {
1083 |           "model_module": "@jupyter-widgets/controls",
1084 |           "model_module_version": "1.5.0",
1085 |           "model_name": "DescriptionStyleModel",
1086 |           "state": {
1087 |             "_model_module": "@jupyter-widgets/controls",
1088 |             "_model_module_version": "1.5.0",
1089 |             "_model_name": "DescriptionStyleModel",
1090 |             "_view_count": null,
1091 |             "_view_module": "@jupyter-widgets/base",
1092 |             "_view_module_version": "1.2.0",
1093 |             "_view_name": "StyleView",
1094 |             "description_width": ""
1095 |           }
1096 |         },
1097 |         "f8a76f16cc6243ee9f4063b45cf4ad59": {
1098 |           "model_module": "@jupyter-widgets/base",
1099 |           "model_module_version": "1.2.0",
1100 |           "model_name": "LayoutModel",
1101 |           "state": {
1102 |             "_model_module": "@jupyter-widgets/base",
1103 |             "_model_module_version": "1.2.0",
1104 |             "_model_name": "LayoutModel",
1105 |             "_view_count": null,
1106 |             "_view_module": "@jupyter-widgets/base",
1107 |             "_view_module_version": "1.2.0",
1108 |             "_view_name": "LayoutView",
1109 |             "align_content": null,
1110 |             "align_items": null,
1111 |             "align_self": null,
1112 |             "border": null,
1113 |             "bottom": null,
1114 |             "display": null,
1115 |             "flex": null,
1116 |             "flex_flow": null,
1117 |             "grid_area": null,
1118 |             "grid_auto_columns": null,
1119 |             "grid_auto_flow": null,
1120 |             "grid_auto_rows": null,
1121 |             "grid_column": null,
1122 |             "grid_gap": null,
1123 |             "grid_row": null,
1124 |             "grid_template_areas": null,
1125 |             "grid_template_columns": null,
1126 |             "grid_template_rows": null,
1127 |             "height": null,
1128 |             "justify_content": null,
1129 |             "justify_items": null,
1130 |             "left": null,
1131 |             "margin": null,
1132 |             "max_height": null,
1133 |             "max_width": null,
1134 |             "min_height": null,
1135 |             "min_width": null,
1136 |             "object_fit": null,
1137 |             "object_position": null,
1138 |             "order": null,
1139 |             "overflow": null,
1140 |             "overflow_x": null,
1141 |             "overflow_y": null,
1142 |             "padding": null,
1143 |             "right": null,
1144 |             "top": null,
1145 |             "visibility": null,
1146 |             "width": null
1147 |           }
1148 |         },
1149 |         "fb53fae9bb8f43668e58817861a99830": {
1150 |           "model_module": "@jupyter-widgets/controls",
1151 |           "model_module_version": "1.5.0",
1152 |           "model_name": "HTMLModel",
1153 |           "state": {
1154 |             "_dom_classes": [],
1155 |             "_model_module": "@jupyter-widgets/controls",
1156 |             "_model_module_version": "1.5.0",
1157 |             "_model_name": "HTMLModel",
1158 |             "_view_count": null,
1159 |             "_view_module": "@jupyter-widgets/controls",
1160 |             "_view_module_version": "1.5.0",
1161 |             "_view_name": "HTMLView",
1162 |             "description": "",
1163 |             "description_tooltip": null,
1164 |             "layout": "IPY_MODEL_ea483ec87d2145dc8d94eaaa3767c826",
1165 |             "placeholder": "​",
1166 |             "style": "IPY_MODEL_e8ef5c1da74f4603845780dcc02f64da",
1167 |             "value": "Map: 100%"
1168 |           }
1169 |         }
1170 |       }
1171 |     }
1172 |   },
1173 |   "nbformat": 4,
1174 |   "nbformat_minor": 0
1175 | }
1176 | 


--------------------------------------------------------------------------------
/img/file_download.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eviltr0N/Make-AI-Clone-of-Yourself/5aa07a15ca334e5a73e8207b24eab7b36dfb8996/img/file_download.png


--------------------------------------------------------------------------------
/img/file_upload.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eviltr0N/Make-AI-Clone-of-Yourself/5aa07a15ca334e5a73e8207b24eab7b36dfb8996/img/file_upload.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | langchain==0.2.5
2 | langchain-community==0.2.5
3 | langchain-core==0.2.9
4 | WPP_Whatsapp==0.4.0


--------------------------------------------------------------------------------