├── .gitignore ├── LICENSE ├── README.md ├── Training_You.ipynb ├── demo.gif ├── extension ├── content_scripts │ ├── jquery.min.js │ └── you.js ├── icons48.png ├── icons96.png ├── manifest.json └── popup │ ├── popup.css │ └── popup.html ├── notebooks └── Untitled.ipynb ├── requirements.txt └── server.py /.gitignore: -------------------------------------------------------------------------------- 1 | data 2 | output 3 | */.ipynb_checkpoints/* 4 | *.DS_STORE 5 | notebooks/.nbgrader.log -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Vivek Aithal and Rishi Mehta 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # You

2 | 3 | An auto-completion tool that is you. 4 | 5 | You lets you train a generative model that can mimic your personal style, and use it as an autocompletion tool. Currently, You trains on WhatsApp chat history, and offers autocomplete suggestions on WhatsApp Web via a Chrome extension. This can be extended to train and autocomplete on more personal communication apps (Messenger, email, slack, Twitter). Everything runs locally and is completely private. 6 | 7 | Contributors : [nuwandavek](https://twitter.com/nuwandavek), [rishicomplex](https://twitter.com/rishicomplex) 8 | 9 | Blog Post : [https://vivekaithal.co/posts/you-complete-you/](https://vivekaithal.co/posts/you-complete-you/) 10 | 11 | ## Demo 12 | ![Demo](demo.gif) 13 | ## Train You on your data 14 | 15 | Training You on your own data is somewhat clunky right now. Follow these steps. First, clone the You repository. 16 | 17 | ```bash 18 | $ git clone https://github.com/nuwandavek/you.git 19 | $ cd you 20 | $ pip install -r requirements.txt 21 | ``` 22 | 23 | 24 | ### Fine tune the model on your WhatsApp chat history 25 | 26 | First, we'll fine-tune the [DistilGPT2](https://huggingface.co/distilgpt2) model on your WhatsApp history. Follow the instructions in [this colab](https://colab.research.google.com/github/nuwandavek/you/blob/master/Training_You.ipynb). Remember, more the data, the better You work(s)! Download the `model.zip` file at the end of this step, and unzip it to a location of your choice. 27 | 28 | ### Install the You Browser extension 29 | > *Note : This extension was tested for Firefox and Chrome* 30 | 31 | **Firefox** 32 | 33 | - Enter `about:debugging#/runtime/this-firefox` in the address bar 34 | - Click on `Load Temporary Add-on...` 35 | - Select the manifest.json file in the `extension` folder 36 | - Click on `Reload` for good measure 37 | 38 | **Chrome** 39 | 40 | - Enter `chrome://extensions/` in the address bar 41 | - Toggle `Developer Mode` (top-right) if you haven't already 42 | - Click on `Load unpacked` 43 | - Select the entire `extension` folder 44 | 45 | 46 | ### Start a server with your model 47 | As the first command line argument, pass the path to the directory containing the model you trained above. 48 | 49 | ``` 50 | python server.py ../Downloads/output 51 | ``` 52 | 53 | ### Usage 54 | - Once you haver the browser extension and the server working, go to `https://web.whatsapp.com/`. 55 | - Make sure the extension is working (you should see a logo at the top-right of the screen indicating that the extension is active). 56 | - Now Click on any user you want to chat with, as usual. 57 | - Whenever you want `You` to fill in, press the `tab` key (you can `tab` to get the whole message prompt, or to finish a sentrence you've already started typing.) 58 | - Select one of the 3 prompts (keyboard and mouse supported), or press the `Esc` key to ignore the prompts. 59 | 60 | --- 61 | 62 | ### ToDos 63 | - Model 64 | - [x] Finetune DistilGPT2 on Whatsapp chat history 65 | - [x] Preprocess and clean data 66 | - [ ] Compute uncertainty and filter responses 67 | - [ ] Compute recommended training data size 68 | - [ ] Experiment with conversation pre-training on a large corpus 69 | - [ ] Checkout other architectures 70 | - [ ] Experiment with `platform` flag in the same model to handle multiple chat platforms 71 | - UI 72 | - [x] Chrome/Firefox extension for Whatsapp web (feature complete) 73 | - Extend to 74 | - [ ] Facebook/messeger 75 | - [ ] Hangouts 76 | - [ ] Gmail 77 | - [ ] Slack 78 | - [ ] Twitter 79 | - Access 80 | - [ ] Blog Post! 81 | - [ ] Make training easier (can it be any easier, though?) 82 | - [ ] Explore using `tf.js` in the extension to avoid the server (will allow many many more people to use it) 83 | 84 | --- 85 | 86 | ### Contributing 87 | Checkout the ToDos. Extending the UI to other platforms may be the easiest place to begin. 88 | -------------------------------------------------------------------------------- /Training_You.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Training You", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "authorship_tag": "ABX9TyOvPZAwQypoWmGilHbvmYBu", 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | " $\"Open$ " 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "id": "Rm5EzwNPr8w0" 33 | }, 34 | "source": [ 35 | "## Setup" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": { 41 | "id": "lTGlWqio-jFO" 42 | }, 43 | "source": [ 44 | "First, connect to a GPU runtime via Edit->Notebook Settings and select GPU as the hardare accelerator. Then, run the block below to install the libraries required to fine-tune DistilGPT2 on your WhatsApp history." 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "metadata": { 50 | "cellView": "form", 51 | "id": "vTVbuOL6Xsx3" 52 | }, 53 | "source": [ 54 | "#@title Install libraries\n", 55 | "!pip install transformers\n", 56 | "!git clone https://github.com/huggingface/transformers.git\n", 57 | "!pip install ./transformers\n", 58 | "!pip install -r ./transformers/examples/language-modeling/requirements.txt\n", 59 | "!mkdir output" 60 | ], 61 | "execution_count": null, 62 | "outputs": [] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "Rb6HmORjsKtw" 68 | }, 69 | "source": [ 70 | "## Upload files for training" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": { 76 | "id": "V8zKaOmC_M2x" 77 | }, 78 | "source": [ 79 | "In order to train the model on your chat history, first export your chat history in the form of txt files (instructions [here](https://faq.whatsapp.com/android/chats/how-to-save-your-chat-history/?lang=en)). Then, run the following block and upload all your txt files." 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "metadata": { 85 | "id": "Vfrb4h88z_mf", 86 | "cellView": "form" 87 | }, 88 | "source": [ 89 | "#@title Upload WhatsApp history files\n", 90 | "import re\n", 91 | "\n", 92 | "def RemoveTimestamps(text):\n", 93 | " return re.sub(b'\\d+/\\d+/\\d+.*-\\ ', b'', text)\n", 94 | "\n", 95 | "def UnicodeString(bytes_string):\n", 96 | " return bytes_string.decode('utf-8')\n", 97 | "\n", 98 | "def AddSeparators(file_text):\n", 99 | " return b'#\\n'.join(file_text.split(b'\\n'))\n", 100 | "\n", 101 | "CHUNK_LENGTH = 500\n", 102 | "def ChunkFile(file_text):\n", 103 | " lines = file_text.split(b'\\n')\n", 104 | " chunks = []\n", 105 | " for line_index in range(0, len(lines), CHUNK_LENGTH):\n", 106 | " chunk = b'\\n'.join(lines[line_index:line_index+CHUNK_LENGTH])\n", 107 | " chunk += b'<|endoftext|>'\n", 108 | " chunks.append(chunk)\n", 109 | " return chunks\n", 110 | "\n", 111 | "from itertools import chain\n", 112 | "import random\n", 113 | "def MixChunks(chunked_files):\n", 114 | " all_chunks = [chunk for chunked_file in chunked_files for chunk in chunked_file]\n", 115 | " random.shuffle(all_chunks)\n", 116 | " return all_chunks\n", 117 | "\n", 118 | "def ConvertChunksToString(chunks):\n", 119 | " return b'\\n'.join(chunks)\n", 120 | "\n", 121 | "def GetShuffledAndCleanedTextFromFiles(file_contents):\n", 122 | " file_chunks = []\n", 123 | " for file_content in file_contents:\n", 124 | " file_chunks.append(ChunkFile(AddSeparators(RemoveTimestamps(file_content))))\n", 125 | " return ConvertChunksToString(MixChunks(file_chunks))\n", 126 | "\n", 127 | "import random\n", 128 | "\n", 129 | "def SampleTextFromFile(file):\n", 130 | " file_contents = open(file).readlines()\n", 131 | " begin = random.randint(0, len(file_contents) - 50)\n", 132 | " for line in file_contents[begin:begin+50]:\n", 133 | " print(line, end='')\n", 134 | "\n", 135 | "from google.colab import files\n", 136 | "uploaded_files = files.upload()" 137 | ], 138 | "execution_count": null, 139 | "outputs": [] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": { 144 | "id": "Aw8e-Hr_gWB_" 145 | }, 146 | "source": [ 147 | "## Construct training data" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": { 153 | "id": "iOh8A-VNgh2P" 154 | }, 155 | "source": [ 156 | "Next, we clean up the data and prep it for training." 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "metadata": { 162 | "id": "cDPYjntwb2US", 163 | "cellView": "form" 164 | }, 165 | "source": [ 166 | "#@title Clean data and create train and test splits.\n", 167 | "cleaned_text = GetShuffledAndCleanedTextFromFiles(uploaded_files.values())\n", 168 | "data_file = open('data.txt', 'wb')\n", 169 | "data_file.write(cleaned_text)\n", 170 | "data_file.close()\n", 171 | "num_lines = cleaned_text.count(b'\\n')\n", 172 | "test_size = int(0.1 * num_lines)\n", 173 | "train_size = num_lines - test_size\n", 174 | "data_file.close()\n", 175 | "!tail -n {test_size} data.txt > test.txt\n", 176 | "!head -n {train_size} data.txt > train.txt" 177 | ], 178 | "execution_count": null, 179 | "outputs": [] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": { 184 | "id": "4QyQGDQ8gmlR" 185 | }, 186 | "source": [ 187 | "We can sample chunks from the training data file to inspect it. Note that a '#' token has been added at the ends of messages, and a <|endoftext|> token delineates different chat files." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "metadata": { 193 | "colab": { 194 | "base_uri": "https://localhost:8080/" 195 | }, 196 | "id": "ejxPLIfHIMuy", 197 | "outputId": "1c86cef4-498d-4b45-8d69-1fdb3dc3d143" 198 | }, 199 | "source": [ 200 | "SampleTextFromFile('train.txt')" 201 | ], 202 | "execution_count": null, 203 | "outputs": [ 204 | { 205 | "output_type": "stream", 206 | "text": [ 207 | "Vivek: Can transfer after one hour of adding#\n", 208 | "Vivek: What verification thing?#\n", 209 | "Sreejith2: The bank account should have 40 lakhs thing#\n", 210 | "Sreejith2: Keep it and transfer after no?#\n", 211 | "Vivek: Yoyo all that is over \\m/#\n", 212 | "Sreejith2: Wooh!#\n", 213 | "Sreejith2: Peace peace#\n", 214 | "Vivek: That was required before visa#\n", 215 | "Sreejith2: Transfer off then#\n", 216 | "Vivek: Now peacemax#\n", 217 | "Vivek: 😅#\n", 218 | "Vivek: Yoyoyo#\n", 219 | "Sreejith2: Hahaha nice nice!#\n", 220 | "Vivek: What plans today?#\n", 221 | "Vivek: Free for a call?#\n", 222 | "Sreejith2: Hey, no plans as such#\n", 223 | "Sreejith2: Yo in 5 mins#\n", 224 | "Vivek: Yoyo#\n", 225 | "Vivek: Ping me#\n", 226 | "Sreejith2: Haan#\n", 227 | "Vivek: Eyo#\n", 228 | "Vivek: I sent 1000 rs#\n", 229 | "Vivek: Got that?#\n", 230 | "Vivek: Once you confirm I'll transfer the rest#\n", 231 | "Sreejith2: Hey got#\n", 232 | "Sreejith2: Got 1k#\n", 233 | "Vivek: Yoyoyo#<|endoftext|>\n", 234 | "Himaya: hope I have a good day#\n", 235 | "Mihir London: https://player.vimeo.com/video/427943452#\n", 236 | "Mihir London: Wtf#\n", 237 | "Sreejith2: Wow that's amazing 🤯#\n", 238 | "Rishi Amreeka: https://youtu.be/fZSFNUT6iY8#\n", 239 | "Rishi Amreeka: Have you guys seen this one?#\n", 240 | "Rishi Amreeka: Pretty insane#\n", 241 | "Sreejith2: Wow, wtf!#\n", 242 | "Sreejith2: Bet this would win pioneer easy 😅#\n", 243 | "Vivek: Are we playing tomorrow?#\n", 244 | "Vivek: :)#\n", 245 | "Vivek: Wtfffff#\n", 246 | "Vikrant London: They also released an API yesterday https://beta.openai.com/#\n", 247 | "Vivek: Yeah, I saw this.#\n", 248 | "Vikrant London: I'm down 👻#\n", 249 | "Rishi Amreeka: I won't be able to make it in the for the first 4 hours#\n", 250 | "Vivek: Whats the start time?#\n", 251 | "Vivek: Also I've to do some work in the morning. Shall we start at 11am ish? Like last week?#\n", 252 | "Vivek: Sreejith wakes up at that time too I think#\n", 253 | "Sreejith2: Works#\n", 254 | "Sreejith2: 11:30 pm indian time iiuc#\n", 255 | "Sreejith2: What time can you make it?#\n", 256 | "Vikrant London: I dreamt about you, and in my dream you said the same thing#\n" 257 | ], 258 | "name": "stdout" 259 | } 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "id": "xQYhEC7-uRZ6" 266 | }, 267 | "source": [ 268 | "## Train model" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": { 274 | "id": "CrWK0ONLhcqT" 275 | }, 276 | "source": [ 277 | "Next, we fine-tune the DistilGPT2 model on our training data. Depending on how many files you uploaded, this could take between 5-30 minutes." 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "metadata": { 283 | "colab": { 284 | "base_uri": "https://localhost:8080/" 285 | }, 286 | "id": "x8KUxgU6_mjl", 287 | "outputId": "b8392a3a-423b-4a90-f380-bd2f0bc2807d" 288 | }, 289 | "source": [ 290 | "!python ./transformers/examples/language-modeling/run_clm.py --model_name_or_path distilgpt2 --train_file train.txt --validation_file test.txt --do_train --do_eval --output_dir ./output --per_gpu_train_batch_size 1 --per_gpu_eval_batch_size 1 --save_steps 800 --eval_steps 800 --logging_steps 800 --evaluation_strategy steps --overwrite_output_dir --block_size 256" 291 | ], 292 | "execution_count": null, 293 | "outputs": [ 294 | { 295 | "output_type": "stream", 296 | "text": [ 297 | "2021-01-05 07:35:25.320305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1\n", 298 | "01/05/2021 07:35:27 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False\n", 299 | "01/05/2021 07:35:27 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir=./output, overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, model_parallel=False, evaluation_strategy=EvaluationStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_steps=0, logging_dir=runs/Jan05_07-35-27_6201cf33901f, logging_first_step=False, logging_steps=800, save_steps=800, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=800, dataloader_num_workers=0, past_index=-1, run_name=./output, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, fp16_backend=auto, sharded_ddp=False, label_smoothing_factor=0.0, adafactor=False)\n", 300 | "Downloading: 2.57kB [00:00, 3.17MB/s] \n", 301 | "Using custom data configuration default\n", 302 | "Downloading and preparing dataset text/default-ead5d6d2afdaed66 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-ead5d6d2afdaed66/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab...\n", 303 | "Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-ead5d6d2afdaed66/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab. Subsequent calls will reuse this data.\n", 304 | "01/05/2021 07:35:28 - INFO - filelock - Lock 139827271320912 acquired on /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631.lock\n", 305 | "[INFO|file_utils.py:1334] 2021-01-05 07:35:28,948 >> https://huggingface.co/distilgpt2/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp55o8lq_4\n", 306 | "Downloading: 100% 762/762 [00:00<00:00, 636kB/s]\n", 307 | "[INFO|file_utils.py:1338] 2021-01-05 07:35:29,214 >> storing https://huggingface.co/distilgpt2/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631\n", 308 | "[INFO|file_utils.py:1341] 2021-01-05 07:35:29,214 >> creating metadata file for /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631\n", 309 | "01/05/2021 07:35:29 - INFO - filelock - Lock 139827271320912 released on /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631.lock\n", 310 | "[INFO|configuration_utils.py:431] 2021-01-05 07:35:29,215 >> loading configuration file https://huggingface.co/distilgpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631\n", 311 | "[INFO|configuration_utils.py:467] 2021-01-05 07:35:29,216 >> Model config GPT2Config {\n", 312 | " \"_num_labels\": 1,\n", 313 | " \"activation_function\": \"gelu_new\",\n", 314 | " \"architectures\": [\n", 315 | " \"GPT2LMHeadModel\"\n", 316 | " ],\n", 317 | " \"attn_pdrop\": 0.1,\n", 318 | " \"bos_token_id\": 50256,\n", 319 | " \"embd_pdrop\": 0.1,\n", 320 | " \"eos_token_id\": 50256,\n", 321 | " \"gradient_checkpointing\": false,\n", 322 | " \"id2label\": {\n", 323 | " \"0\": \"LABEL_0\"\n", 324 | " },\n", 325 | " \"initializer_range\": 0.02,\n", 326 | " \"label2id\": {\n", 327 | " \"LABEL_0\": 0\n", 328 | " },\n", 329 | " \"layer_norm_epsilon\": 1e-05,\n", 330 | " \"model_type\": \"gpt2\",\n", 331 | " \"n_ctx\": 1024,\n", 332 | " \"n_embd\": 768,\n", 333 | " \"n_head\": 12,\n", 334 | " \"n_inner\": null,\n", 335 | " \"n_layer\": 6,\n", 336 | " \"n_positions\": 1024,\n", 337 | " \"resid_pdrop\": 0.1,\n", 338 | " \"summary_activation\": null,\n", 339 | " \"summary_first_dropout\": 0.1,\n", 340 | " \"summary_proj_to_labels\": true,\n", 341 | " \"summary_type\": \"cls_index\",\n", 342 | " \"summary_use_proj\": true,\n", 343 | " \"task_specific_params\": {\n", 344 | " \"text-generation\": {\n", 345 | " \"do_sample\": true,\n", 346 | " \"max_length\": 50\n", 347 | " }\n", 348 | " },\n", 349 | " \"use_cache\": true,\n", 350 | " \"vocab_size\": 50257\n", 351 | "}\n", 352 | "\n", 353 | "[INFO|configuration_utils.py:431] 2021-01-05 07:35:29,481 >> loading configuration file https://huggingface.co/distilgpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/f985248d2791fcff97732e4ee263617adec1edb5429a2b8421734c6d14e39bee.422318838d1ec4e061efb4ea29671cb2a044e244dc69229682bebd7cacc81631\n", 354 | "[INFO|configuration_utils.py:467] 2021-01-05 07:35:29,482 >> Model config GPT2Config {\n", 355 | " \"_num_labels\": 1,\n", 356 | " \"activation_function\": \"gelu_new\",\n", 357 | " \"architectures\": [\n", 358 | " \"GPT2LMHeadModel\"\n", 359 | " ],\n", 360 | " \"attn_pdrop\": 0.1,\n", 361 | " \"bos_token_id\": 50256,\n", 362 | " \"embd_pdrop\": 0.1,\n", 363 | " \"eos_token_id\": 50256,\n", 364 | " \"gradient_checkpointing\": false,\n", 365 | " \"id2label\": {\n", 366 | " \"0\": \"LABEL_0\"\n", 367 | " },\n", 368 | " \"initializer_range\": 0.02,\n", 369 | " \"label2id\": {\n", 370 | " \"LABEL_0\": 0\n", 371 | " },\n", 372 | " \"layer_norm_epsilon\": 1e-05,\n", 373 | " \"model_type\": \"gpt2\",\n", 374 | " \"n_ctx\": 1024,\n", 375 | " \"n_embd\": 768,\n", 376 | " \"n_head\": 12,\n", 377 | " \"n_inner\": null,\n", 378 | " \"n_layer\": 6,\n", 379 | " \"n_positions\": 1024,\n", 380 | " \"resid_pdrop\": 0.1,\n", 381 | " \"summary_activation\": null,\n", 382 | " \"summary_first_dropout\": 0.1,\n", 383 | " \"summary_proj_to_labels\": true,\n", 384 | " \"summary_type\": \"cls_index\",\n", 385 | " \"summary_use_proj\": true,\n", 386 | " \"task_specific_params\": {\n", 387 | " \"text-generation\": {\n", 388 | " \"do_sample\": true,\n", 389 | " \"max_length\": 50\n", 390 | " }\n", 391 | " },\n", 392 | " \"use_cache\": true,\n", 393 | " \"vocab_size\": 50257\n", 394 | "}\n", 395 | "\n", 396 | "01/05/2021 07:35:29 - INFO - filelock - Lock 139827134885224 acquired on /root/.cache/huggingface/transformers/55051ac97dcc32f0a736d21a32a4d42b0d9b90f117ca7c38e65038b04bd5c3f5.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f.lock\n", 397 | "[INFO|file_utils.py:1334] 2021-01-05 07:35:29,756 >> https://huggingface.co/distilgpt2/resolve/main/vocab.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp0dfokwoz\n", 398 | "Downloading: 100% 1.04M/1.04M [00:00<00:00, 2.04MB/s]\n", 399 | "[INFO|file_utils.py:1338] 2021-01-05 07:35:30,540 >> storing https://huggingface.co/distilgpt2/resolve/main/vocab.json in cache at /root/.cache/huggingface/transformers/55051ac97dcc32f0a736d21a32a4d42b0d9b90f117ca7c38e65038b04bd5c3f5.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f\n", 400 | "[INFO|file_utils.py:1341] 2021-01-05 07:35:30,540 >> creating metadata file for /root/.cache/huggingface/transformers/55051ac97dcc32f0a736d21a32a4d42b0d9b90f117ca7c38e65038b04bd5c3f5.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f\n", 401 | "01/05/2021 07:35:30 - INFO - filelock - Lock 139827134885224 released on /root/.cache/huggingface/transformers/55051ac97dcc32f0a736d21a32a4d42b0d9b90f117ca7c38e65038b04bd5c3f5.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f.lock\n", 402 | "01/05/2021 07:35:30 - INFO - filelock - Lock 139827134885112 acquired on /root/.cache/huggingface/transformers/9dfb299b74cdf7601ba7cd3a8073dbdac351caec0ed7ab5849b098b3c8ae3d57.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.lock\n", 403 | "[INFO|file_utils.py:1334] 2021-01-05 07:35:30,812 >> https://huggingface.co/distilgpt2/resolve/main/merges.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmphmn8mzvm\n", 404 | "Downloading: 100% 456k/456k [00:00<00:00, 1.10MB/s]\n", 405 | "[INFO|file_utils.py:1338] 2021-01-05 07:35:31,501 >> storing https://huggingface.co/distilgpt2/resolve/main/merges.txt in cache at /root/.cache/huggingface/transformers/9dfb299b74cdf7601ba7cd3a8073dbdac351caec0ed7ab5849b098b3c8ae3d57.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b\n", 406 | "[INFO|file_utils.py:1341] 2021-01-05 07:35:31,501 >> creating metadata file for /root/.cache/huggingface/transformers/9dfb299b74cdf7601ba7cd3a8073dbdac351caec0ed7ab5849b098b3c8ae3d57.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b\n", 407 | "01/05/2021 07:35:31 - INFO - filelock - Lock 139827134885112 released on /root/.cache/huggingface/transformers/9dfb299b74cdf7601ba7cd3a8073dbdac351caec0ed7ab5849b098b3c8ae3d57.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.lock\n", 408 | "01/05/2021 07:35:31 - INFO - filelock - Lock 139827134885840 acquired on /root/.cache/huggingface/transformers/accb287b5a5396b2597382916b6cc939fdab1366e89475a92338d3971b3d02b7.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0.lock\n", 409 | "[INFO|file_utils.py:1334] 2021-01-05 07:35:31,779 >> https://huggingface.co/distilgpt2/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmphm0j8_48\n", 410 | "Downloading: 100% 1.36M/1.36M [00:00<00:00, 2.64MB/s]\n", 411 | "[INFO|file_utils.py:1338] 2021-01-05 07:35:32,571 >> storing https://huggingface.co/distilgpt2/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/accb287b5a5396b2597382916b6cc939fdab1366e89475a92338d3971b3d02b7.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0\n", 412 | "[INFO|file_utils.py:1341] 2021-01-05 07:35:32,571 >> creating metadata file for /root/.cache/huggingface/transformers/accb287b5a5396b2597382916b6cc939fdab1366e89475a92338d3971b3d02b7.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0\n", 413 | "01/05/2021 07:35:32 - INFO - filelock - Lock 139827134885840 released on /root/.cache/huggingface/transformers/accb287b5a5396b2597382916b6cc939fdab1366e89475a92338d3971b3d02b7.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0.lock\n", 414 | "[INFO|tokenization_utils_base.py:1802] 2021-01-05 07:35:32,571 >> loading file https://huggingface.co/distilgpt2/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/55051ac97dcc32f0a736d21a32a4d42b0d9b90f117ca7c38e65038b04bd5c3f5.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f\n", 415 | "[INFO|tokenization_utils_base.py:1802] 2021-01-05 07:35:32,571 >> loading file https://huggingface.co/distilgpt2/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/9dfb299b74cdf7601ba7cd3a8073dbdac351caec0ed7ab5849b098b3c8ae3d57.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b\n", 416 | "[INFO|tokenization_utils_base.py:1802] 2021-01-05 07:35:32,571 >> loading file https://huggingface.co/distilgpt2/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/accb287b5a5396b2597382916b6cc939fdab1366e89475a92338d3971b3d02b7.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0\n", 417 | "01/05/2021 07:35:32 - INFO - filelock - Lock 139827117379992 acquired on /root/.cache/huggingface/transformers/43a212e83e76bcb07f45be584cf100676bdbbbe9c13f9e5c1c050049143a832f.a83d881ec4d624fd4b5826dd026e315246c48c67504ff91c0500570e291a54ba.lock\n", 418 | "[INFO|file_utils.py:1334] 2021-01-05 07:35:32,896 >> https://huggingface.co/distilgpt2/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpkzcur190\n", 419 | "Downloading: 100% 353M/353M [00:03<00:00, 90.8MB/s]\n", 420 | "[INFO|file_utils.py:1338] 2021-01-05 07:35:36,864 >> storing https://huggingface.co/distilgpt2/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/43a212e83e76bcb07f45be584cf100676bdbbbe9c13f9e5c1c050049143a832f.a83d881ec4d624fd4b5826dd026e315246c48c67504ff91c0500570e291a54ba\n", 421 | "[INFO|file_utils.py:1341] 2021-01-05 07:35:36,864 >> creating metadata file for /root/.cache/huggingface/transformers/43a212e83e76bcb07f45be584cf100676bdbbbe9c13f9e5c1c050049143a832f.a83d881ec4d624fd4b5826dd026e315246c48c67504ff91c0500570e291a54ba\n", 422 | "01/05/2021 07:35:36 - INFO - filelock - Lock 139827117379992 released on /root/.cache/huggingface/transformers/43a212e83e76bcb07f45be584cf100676bdbbbe9c13f9e5c1c050049143a832f.a83d881ec4d624fd4b5826dd026e315246c48c67504ff91c0500570e291a54ba.lock\n", 423 | "[INFO|modeling_utils.py:1024] 2021-01-05 07:35:36,864 >> loading weights file https://huggingface.co/distilgpt2/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/43a212e83e76bcb07f45be584cf100676bdbbbe9c13f9e5c1c050049143a832f.a83d881ec4d624fd4b5826dd026e315246c48c67504ff91c0500570e291a54ba\n", 424 | "[INFO|modeling_utils.py:1140] 2021-01-05 07:35:40,144 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.\n", 425 | "\n", 426 | "[INFO|modeling_utils.py:1149] 2021-01-05 07:35:40,144 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at distilgpt2.\n", 427 | "If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.\n", 428 | "100% 57/57 [00:01<00:00, 35.60ba/s]\n", 429 | "100% 7/7 [00:00<00:00, 44.50ba/s]\n", 430 | "100% 57/57 [00:03<00:00, 18.73ba/s]\n", 431 | "100% 7/7 [00:00<00:00, 21.16ba/s]\n", 432 | "[INFO|trainer.py:396] 2021-01-05 07:35:59,491 >> The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: .\n", 433 | "[INFO|trainer.py:396] 2021-01-05 07:35:59,491 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: .\n", 434 | "[WARNING|training_args.py:450] 2021-01-05 07:35:59,492 >> Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.\n", 435 | "[WARNING|training_args.py:450] 2021-01-05 07:35:59,493 >> Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.\n", 436 | "[INFO|trainer.py:719] 2021-01-05 07:35:59,493 >> ***** Running training *****\n", 437 | "[INFO|trainer.py:720] 2021-01-05 07:35:59,493 >> Num examples = 3015\n", 438 | "[INFO|trainer.py:721] 2021-01-05 07:35:59,493 >> Num Epochs = 3\n", 439 | "[INFO|trainer.py:722] 2021-01-05 07:35:59,493 >> Instantaneous batch size per device = 8\n", 440 | "[INFO|trainer.py:723] 2021-01-05 07:35:59,493 >> Total train batch size (w. parallel, distributed & accumulation) = 1\n", 441 | "[INFO|trainer.py:724] 2021-01-05 07:35:59,493 >> Gradient Accumulation steps = 1\n", 442 | "[INFO|trainer.py:725] 2021-01-05 07:35:59,493 >> Total optimization steps = 9045\n", 443 | "[WARNING|training_args.py:450] 2021-01-05 07:35:59,497 >> Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.\n", 444 | "[WARNING|training_args.py:467] 2021-01-05 07:35:59,497 >> Using deprecated `--per_gpu_eval_batch_size` argument which will be removed in a future version. Using `--per_device_eval_batch_size` is preferred.\n", 445 | "{'loss': 2.755792236328125, 'learning_rate': 4.5577667219458266e-05, 'epoch': 0.26533996683250416}\n", 446 | " 9% 800/9045 [00:39<06:45, 20.33it/s][WARNING|training_args.py:467] 2021-01-05 07:36:39,467 >> Using deprecated `--per_gpu_eval_batch_size` argument which will be removed in a future version. Using `--per_device_eval_batch_size` is preferred.\n", 447 | "[INFO|trainer.py:1441] 2021-01-05 07:36:39,467 >> ***** Running Evaluation *****\n", 448 | "[INFO|trainer.py:1442] 2021-01-05 07:36:39,467 >> Num examples = 338\n", 449 | "[INFO|trainer.py:1443] 2021-01-05 07:36:39,468 >> Batch size = 1\n", 450 | "\n", 451 | " 0% 0/338 [00:00> Saving model checkpoint to ./output/checkpoint-800\n", 487 | "[INFO|configuration_utils.py:289] 2021-01-05 07:36:42,836 >> Configuration saved in ./output/checkpoint-800/config.json\n", 488 | "[INFO|modeling_utils.py:814] 2021-01-05 07:36:43,819 >> Model weights saved in ./output/checkpoint-800/pytorch_model.bin\n", 489 | " 11% 1022/9045 [00:58<06:39, 20.10it/s]" 490 | ], 491 | "name": "stdout" 492 | } 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": { 498 | "id": "vj_aJ6myvpZR" 499 | }, 500 | "source": [ 501 | "## Play with model" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "metadata": { 507 | "id": "6riV2LC7vqPQ" 508 | }, 509 | "source": [ 510 | "from transformers import pipeline" 511 | ], 512 | "execution_count": null, 513 | "outputs": [] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "metadata": { 518 | "id": "0CV12Xz0wiI5" 519 | }, 520 | "source": [ 521 | "ft_generator = pipeline('text-generation', model='./output')" 522 | ], 523 | "execution_count": null, 524 | "outputs": [] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "metadata": { 529 | "id": "U7oCnZNhPCEZ" 530 | }, 531 | "source": [ 532 | "def PrettyPrintPrediction(text):\n", 533 | " print()\n", 534 | " text = text.replace('#', '\\n')\n", 535 | " print(text)" 536 | ], 537 | "execution_count": null, 538 | "outputs": [] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "metadata": { 543 | "id": "XBnHXEjvf0HX" 544 | }, 545 | "source": [ 546 | "ft_generator( )" 547 | ], 548 | "execution_count": null, 549 | "outputs": [] 550 | }, 551 | { 552 | "cell_type": "code", 553 | "metadata": { 554 | "colab": { 555 | "base_uri": "https://localhost:8080/" 556 | }, 557 | "id": "IMfMQy-o2Q__", 558 | "outputId": "47c392ca-659d-4e17-d04f-1a0598780ba1" 559 | }, 560 | "source": [ 561 | "for text in ft_generator(\"Vivek: Mihir sucks #Sreejith2: I agree! Tell me more#Vivek: Dude he always makes fun of me#Vivek:\", max_length=256, num_return_sequences=3):\n", 562 | " PrettyPrintPrediction(text['generated_text'])" 563 | ], 564 | "execution_count": null, 565 | "outputs": [ 566 | { 567 | "output_type": "stream", 568 | "text": [ 569 | "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" 570 | ], 571 | "name": "stderr" 572 | }, 573 | { 574 | "output_type": "stream", 575 | "text": [ 576 | "\n", 577 | "Vivek: Mihir sucks \\m/\n", 578 | "Sreejith2: I agree! Tell me more\n", 579 | "Vivek: Dude he always makes fun of me\n", 580 | "Vivek: 🤣\n", 581 | "Sreejith2: Hey thanks man. There is just one guy in the building who says \"hey fuck you bitch\"\n", 582 | "Sreejith2: https://youtu.be/z6YQtJd8sq\n", 583 | "Vivek: He has more jokes than Hitler\n", 584 | "Vivek: 🤣\n", 585 | "Sreejith2: Oho\n", 586 | "Sreejith2: Hey, all peace is here man\n", 587 | "Vivek: Peace, will probably find peace here soon\n", 588 | "Sreejith2: I think only if you actually feel safe.\n", 589 | "Vivek: Yup.\n", 590 | "Sreejith2: Can stay in your car next morning\n", 591 | "Vivek: There?\n", 592 | "Sreejith2: Come to the police station\n", 593 | "Sreejith2: Wassup man\n", 594 | "Vivek: Hey!\n", 595 | "Sreejith2: Whose name you're working on?\n", 596 | "Vivek: I want to go to your place\n", 597 | "Sreejith2: What time\n", 598 | "\n", 599 | "Vivek: Mihir sucks \\m/\n", 600 | "Sreejith2: I agree! Tell me more\n", 601 | "Vivek: Dude he always makes fun of me\n", 602 | "Vivek: He's the same person\n", 603 | "Vivek: 🙄\n", 604 | "Sreejith2: Hey how did you get your visa?\n", 605 | "Vivek: To the US?\n", 606 | "Sreejith2: Hahaha\n", 607 | "Vivek: I wanted to change the visa\n", 608 | "Vivek: I decided to apply for it\n", 609 | "Sreejith2: In Bangalore for the last 2 days\n", 610 | "Vivek: And you've come across this\n", 611 | "Vivek: 🙈\n", 612 | "Vivek: Also I'm going to go to London for the last 3 days\n", 613 | "Sreejith2: Hey y'all have you gotten the visa?\n", 614 | "Vivek: And you'll need to submit it to the new US mail\n", 615 | "Vivek: I have no visa\n", 616 | "Vivek: And there is no visa\n", 617 | "Sreejith2: Ah ok okay😋\n", 618 | "Vivek: 👻\n", 619 | "Sreejith2: Also, don't you have one that will do it\n", 620 | "Sreejith2\n", 621 | "\n", 622 | "Vivek: Mihir sucks \\m/\n", 623 | "Sreejith2: I agree! Tell me more\n", 624 | "Vivek: Dude he always makes fun of me\n", 625 | "Vivek: Have you met?\n", 626 | "Sreejith2: Haan yeah\n", 627 | "Vivek: I'm meeting him tomorrow\n", 628 | "Sreejith2: \n", 629 | "Sreejith2: \n", 630 | "Vivek: He's being called\n", 631 | "Vivek: 😂😂\n", 632 | "Vivek: Hahahahahahaha\n", 633 | "Vivek: 😬\n", 634 | "Sreejith2: I think he's a weird character\n", 635 | "Vivek: In SF and tech\n", 636 | "Sreejith2: Also\n", 637 | "Sreejith2: I don't know if she's in SF now\n", 638 | "Vivek: 🙈\n", 639 | "Sreejith2: Yo what location?\n", 640 | "Vivek: Eyo\n", 641 | "Sreejith2: Hey\n", 642 | "Sreejith2: Yoyo\n", 643 | "Vivek: There\n", 644 | "Vivek: You\n", 645 | "Vivek: Free for a call?\n", 646 | "Vivek: For a call?\n", 647 | "Sreejith2: Yo\n", 648 | "Vivek\n" 649 | ], 650 | "name": "stdout" 651 | } 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "metadata": { 657 | "id": "RDRDEqH2vGMV" 658 | }, 659 | "source": [ 660 | "## Download model" 661 | ] 662 | }, 663 | { 664 | "cell_type": "markdown", 665 | "metadata": { 666 | "id": "8gp4JYnzjMRf" 667 | }, 668 | "source": [ 669 | "Download the model so you can use it with the Chrome extension." 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "metadata": { 675 | "colab": { 676 | "base_uri": "https://localhost:8080/" 677 | }, 678 | "id": "fac6_VtIvHQ_", 679 | "outputId": "80f41b42-e58a-4228-cdde-a38945fe1b37" 680 | }, 681 | "source": [ 682 | "!zip model.zip ./output/*" 683 | ], 684 | "execution_count": null, 685 | "outputs": [ 686 | { 687 | "output_type": "stream", 688 | "text": [ 689 | " adding: output/checkpoint-1600/ (stored 0%)\n", 690 | " adding: output/checkpoint-2400/ (stored 0%)\n", 691 | " adding: output/checkpoint-3200/ (stored 0%)\n", 692 | " adding: output/checkpoint-4000/ (stored 0%)\n", 693 | " adding: output/checkpoint-800/ (stored 0%)\n", 694 | " adding: output/config.json (deflated 51%)\n", 695 | " adding: output/eval_results_clm.txt (stored 0%)\n", 696 | " adding: output/merges.txt (deflated 53%)\n", 697 | " adding: output/pytorch_model.bin (deflated 9%)\n", 698 | " adding: output/special_tokens_map.json (deflated 52%)\n", 699 | " adding: output/tokenizer_config.json (deflated 38%)\n", 700 | " adding: output/trainer_state.json (deflated 70%)\n", 701 | " adding: output/training_args.bin (deflated 46%)\n", 702 | " adding: output/train_results.txt (deflated 10%)\n", 703 | " adding: output/vocab.json (deflated 59%)\n" 704 | ], 705 | "name": "stdout" 706 | } 707 | ] 708 | }, 709 | { 710 | "cell_type": "code", 711 | "metadata": { 712 | "colab": { 713 | "base_uri": "https://localhost:8080/" 714 | }, 715 | "id": "1Cvky9WHxLSd", 716 | "outputId": "c091e24f-bb09-4677-baa8-abcfb197dbcb" 717 | }, 718 | "source": [ 719 | "ls -l" 720 | ], 721 | "execution_count": null, 722 | "outputs": [ 723 | { 724 | "output_type": "stream", 725 | "text": [ 726 | "total 302268\n", 727 | "-rw-r--r-- 1 root root 1276333 Jan 4 07:30 data.txt\n", 728 | "-rw-r--r-- 1 root root 305021348 Jan 4 07:58 model.zip\n", 729 | "drwxr-xr-x 7 root root 4096 Jan 4 07:38 \u001b[0m\u001b[01;34moutput\u001b[0m/\n", 730 | "drwxr-xr-x 3 root root 4096 Jan 4 07:31 \u001b[01;34mruns\u001b[0m/\n", 731 | "drwxr-xr-x 1 root root 4096 Dec 21 17:29 \u001b[01;34msample_data\u001b[0m/\n", 732 | "-rw-r--r-- 1 root root 127482 Jan 4 07:30 test.txt\n", 733 | "-rw-r--r-- 1 root root 1148810 Jan 4 07:30 train.txt\n", 734 | "drwxr-xr-x 15 root root 4096 Jan 4 07:28 \u001b[01;34mtransformers\u001b[0m/\n", 735 | "-rw-r--r-- 1 root root 188024 Jan 4 07:29 'WhatsApp Chat with 5 Years Time 🌞.txt'\n", 736 | "-rw-r--r-- 1 root root 96072 Jan 4 07:29 'WhatsApp Chat with Mihir London.txt'\n", 737 | "-rw-r--r-- 1 root root 493383 Jan 4 07:30 'WhatsApp Chat with Rishi Amreeka.txt'\n", 738 | "-rw-r--r-- 1 root root 271150 Jan 4 07:29 'WhatsApp Chat with Sreejith2.txt'\n", 739 | "-rw-r--r-- 1 root root 351486 Jan 4 07:29 'WhatsApp Chat with Sreejith.txt'\n", 740 | "-rw-r--r-- 1 root root 509144 Jan 4 07:30 'WhatsApp Chat with Vikrant London.txt'\n" 741 | ], 742 | "name": "stdout" 743 | } 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "metadata": { 749 | "colab": { 750 | "base_uri": "https://localhost:8080/", 751 | "height": 17 752 | }, 753 | "id": "IF1Qh8X6xrt1", 754 | "outputId": "f4c9ad0f-b020-47ff-8eee-0fe0ac1f8e1b" 755 | }, 756 | "source": [ 757 | "files.download('model.zip')" 758 | ], 759 | "execution_count": null, 760 | "outputs": [ 761 | { 762 | "output_type": "display_data", 763 | "data": { 764 | "application/javascript": [ 765 | "\n", 766 | " async function download(id, filename, size) {\n", 767 | " if (!google.colab.kernel.accessAllowed) {\n", 768 | " return;\n", 769 | " }\n", 770 | " const div = document.createElement('div');\n", 771 | " const label = document.createElement('label');\n", 772 | " label.textContent = `Downloading \"${filename}\": `;\n", 773 | " div.appendChild(label);\n", 774 | " const progress = document.createElement('progress');\n", 775 | " progress.max = size;\n", 776 | " div.appendChild(progress);\n", 777 | " document.body.appendChild(div);\n", 778 | "\n", 779 | " const buffers = [];\n", 780 | " let downloaded = 0;\n", 781 | "\n", 782 | " const channel = await google.colab.kernel.comms.open(id);\n", 783 | " // Send a message to notify the kernel that we're ready.\n", 784 | " channel.send({})\n", 785 | "\n", 786 | " for await (const message of channel.messages) {\n", 787 | " // Send a message to notify the kernel that we're ready.\n", 788 | " channel.send({})\n", 789 | " if (message.buffers) {\n", 790 | " for (const buffer of message.buffers) {\n", 791 | " buffers.push(buffer);\n", 792 | " downloaded += buffer.byteLength;\n", 793 | " progress.value = downloaded;\n", 794 | " }\n", 795 | " }\n", 796 | " }\n", 797 | " const blob = new Blob(buffers, {type: 'application/binary'});\n", 798 | " const a = document.createElement('a');\n", 799 | " a.href = window.URL.createObjectURL(blob);\n", 800 | " a.download = filename;\n", 801 | " div.appendChild(a);\n", 802 | " a.click();\n", 803 | " div.remove();\n", 804 | " }\n", 805 | " " 806 | ], 807 | "text/plain": [ 808 | "" 809 | ] 810 | }, 811 | "metadata": { 812 | "tags": [] 813 | } 814 | }, 815 | { 816 | "output_type": "display_data", 817 | "data": { 818 | "application/javascript": [ 819 | "download(\"download_8df03770-4b81-4eec-8bb7-342f78da7a4a\", \"model.zip\", 305021348)" 820 | ], 821 | "text/plain": [ 822 | "" 823 | ] 824 | }, 825 | "metadata": { 826 | "tags": [] 827 | } 828 | } 829 | ] 830 | }, 831 | { 832 | "cell_type": "markdown", 833 | "metadata": { 834 | "id": "Hk9baYSr62rD" 835 | }, 836 | "source": [ 837 | "## Load a saved model" 838 | ] 839 | }, 840 | { 841 | "cell_type": "markdown", 842 | "metadata": { 843 | "id": "l_4X0yePiqv2" 844 | }, 845 | "source": [ 846 | "Use this to play with a model you've previously downloaded. You will need to connect colab to a locally running Jupyter runtime." 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "metadata": { 852 | "id": "gfxgcKlb6_Uf" 853 | }, 854 | "source": [ 855 | "from transformers import pipeline" 856 | ], 857 | "execution_count": null, 858 | "outputs": [] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "metadata": { 863 | "id": "PJT7FdFG7M8q" 864 | }, 865 | "source": [ 866 | "ft_generator = pipeline('text-generation', model='../../Downloads/output_2')" 867 | ], 868 | "execution_count": null, 869 | "outputs": [] 870 | }, 871 | { 872 | "cell_type": "code", 873 | "metadata": { 874 | "colab": { 875 | "base_uri": "https://localhost:8080/" 876 | }, 877 | "id": "HscWDyJgz0fy", 878 | "outputId": "ddf36a2c-8e73-4943-8ca2-9a3188286874" 879 | }, 880 | "source": [ 881 | "for text in ft_generator(\"Vivek: Mihir sucks :(#Sreejith2: I agree! Tell me more#Vivek: Dude he always makes fun of me#Vivek:\", max_length=100, num_return_sequences=3, do_sample=True, eos_token_id=2, pad_token_id=0, skip_special_tokens=True, top_k=50, top_p=0.95):\n", 882 | " PrettyPrintPrediction(text['generated_text'])" 883 | ], 884 | "execution_count": null, 885 | "outputs": [ 886 | { 887 | "output_type": "stream", 888 | "text": [ 889 | "\n", 890 | "Vivek: Mihir sucks :(\n", 891 | "Sreejith2: I agree! Tell me more\n", 892 | "Vivek: Dude he always makes fun of me\n", 893 | "Vivek: I just wanted to know if there is any one thing that he does in life for me no?\n", 894 | "\n", 895 | "\n", 896 | "Vivek: Mihir sucks :(\n", 897 | "Sreejith2: I agree! Tell me more\n", 898 | "Vivek: Dude he always makes fun of me\n", 899 | "Vivek: And what did you mean?\n", 900 | "!!!!!!!!!!!!!!\n", 901 | "\n", 902 | "Vivek: Mihir sucks :(\n", 903 | "Sreejith2: I agree! Tell me more\n", 904 | "Vivek: Dude he always makes fun of me\n", 905 | "Vivek: That was a big problem for me when I was younger 🤣\n", 906 | "!!!!!!\n" 907 | ], 908 | "name": "stdout" 909 | } 910 | ] 911 | } 912 | ] 913 | } -------------------------------------------------------------------------------- /demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nuwandavek/you/9113fc3cabf2a7de7718938e93961faff7b7d8ee/demo.gif -------------------------------------------------------------------------------- /extension/content_scripts/jquery.min.js: -------------------------------------------------------------------------------- 1 | /*! jQuery v3.5.1 | (c) JS Foundation and other contributors | jquery.org/license */ 2 | !function(e,t){"use strict";"object"==typeof module&&"object"==typeof module.exports?module.exports=e.document?t(e,!0):function(e){if(!e.document)throw new Error("jQuery requires a window with a document");return t(e)}:t(e)}("undefined"!=typeof window?window:this,function(C,e){"use strict";var t=[],r=Object.getPrototypeOf,s=t.slice,g=t.flat?function(e){return t.flat.call(e)}:function(e){return t.concat.apply([],e)},u=t.push,i=t.indexOf,n={},o=n.toString,v=n.hasOwnProperty,a=v.toString,l=a.call(Object),y={},m=function(e){return"function"==typeof e&&"number"!=typeof e.nodeType},x=function(e){return null!=e&&e===e.window},E=C.document,c={type:!0,src:!0,nonce:!0,noModule:!0};function b(e,t,n){var r,i,o=(n=n||E).createElement("script");if(o.text=e,t)for(r in c)(i=t[r]||t.getAttribute&&t.getAttribute(r))&&o.setAttribute(r,i);n.head.appendChild(o).parentNode.removeChild(o)}function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof e?n[o.call(e)]||"object":typeof e}var f="3.5.1",S=function(e,t){return new S.fn.init(e,t)};function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e)&&("array"===n||0===t||"number"==typeof t&&0+~]|"+M+")"+M+"*"),U=new RegExp(M+"|>"),X=new RegExp(F),V=new RegExp("^"+I+"$"),G={ID:new RegExp("^#("+I+")"),CLASS:new RegExp("^\\.("+I+")"),TAG:new RegExp("^("+I+"|[*])"),ATTR:new RegExp("^"+W),PSEUDO:new RegExp("^"+F),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\$"+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\$|)","i"),bool:new RegExp("^(?:"+R+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\$"+M+"*((?:-\\d)?\\d*)"+M+"*\$|)(?=[^-]|$)","i")},Y=/HTML$/i,Q=/^(?:input|select|textarea|button)$/i,J=/^h\d$/i,K=/^[^{]+\{\s*\[native \w/,Z=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ee=/[+~]/,te=new RegExp("\\\\[\\da-fA-F]{1,6}"+M+"?|\\\\([^\\r\\n\\f])","g"),ne=function(e,t){var n="0x"+e.slice(1)-65536;return t||(n<0?String.fromCharCode(n+65536):String.fromCharCode(n>>10|55296,1023&n|56320))},re=/([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g,ie=function(e,t){return t?"\0"===e?"\ufffd":e.slice(0,-1)+"\\"+e.charCodeAt(e.length-1).toString(16)+" ":"\\"+e},oe=function(){T()},ae=be(function(e){return!0===e.disabled&&"fieldset"===e.nodeName.toLowerCase()},{dir:"parentNode",next:"legend"});try{H.apply(t=O.call(p.childNodes),p.childNodes),t[p.childNodes.length].nodeType}catch(e){H={apply:t.length?function(e,t){L.apply(e,O.call(t))}:function(e,t){var n=e.length,r=0;while(e[n++]=t[r++]);e.length=n-1}}}function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeType:9;if(n=n||[],"string"!=typeof t||!t||1!==p&&9!==p&&11!==p)return n;if(!r&&(T(e),e=e||C,E)){if(11!==p&&(u=Z.exec(t)))if(i=u[1]){if(9===p){if(!(a=e.getElementById(i)))return n;if(a.id===i)return n.push(a),n}else if(f&&(a=f.getElementById(i))&&y(e,a)&&a.id===i)return n.push(a),n}else{if(u[2])return H.apply(n,e.getElementsByTagName(t)),n;if((i=u[3])&&d.getElementsByClassName&&e.getElementsByClassName)return H.apply(n,e.getElementsByClassName(i)),n}if(d.qsa&&!N[t+" "]&&(!v||!v.test(t))&&(1!==p||"object"!==e.nodeName.toLowerCase())){if(c=t,f=e,1===p&&(U.test(t)||z.test(t))){(f=ee.test(t)&&ye(e.parentNode)||e)===e&&d.scope||((s=e.getAttribute("id"))?s=s.replace(re,ie):e.setAttribute("id",s=S)),o=(l=h(t)).length;while(o--)l[o]=(s?"#"+s:":scope")+" "+xe(l[o]);c=l.join(",")}try{return H.apply(n,f.querySelectorAll(c)),n}catch(e){N(t,!0)}finally{s===S&&e.removeAttribute("id")}}}return g(t.replace($,"$1"),e,n,r)}function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cacheLength&&delete e[r.shift()],e[t+" "]=n}}function le(e){return e[S]=!0,e}function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(e){return!1}finally{t.parentNode&&t.parentNode.removeChild(t),t=null}}function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[r]]=t}function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourceIndex-t.sourceIndex;if(r)return r;if(n)while(n=n.nextSibling)if(n===t)return-1;return e?1:-1}function de(t){return function(e){return"input"===e.nodeName.toLowerCase()&&e.type===t}}function he(n){return function(e){var t=e.nodeName.toLowerCase();return("input"===t||"button"===t)&&e.type===n}}function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e.disabled?"label"in e?"label"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&ae(e)===t:e.disabled===t:"label"in e&&e.disabled===t}}function ve(a){return le(function(o){return o=+o,le(function(e,t){var n,r=a([],e.length,o),i=r.length;while(i--)e[n=r[i]]&&(e[n]=!(t[n]=e[n]))})})}function ye(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}for(e in d=se.support={},i=se.isXML=function(e){var t=e.namespaceURI,n=(e.ownerDocument||e).documentElement;return!Y.test(t||n&&n.nodeName||"HTML")},T=se.setDocument=function(e){var t,n,r=e?e.ownerDocument||e:p;return r!=C&&9===r.nodeType&&r.documentElement&&(a=(C=r).documentElement,E=!i(C),p!=C&&(n=C.defaultView)&&n.top!==n&&(n.addEventListener?n.addEventListener("unload",oe,!1):n.attachEvent&&n.attachEvent("onunload",oe)),d.scope=ce(function(e){return a.appendChild(e).appendChild(C.createElement("div")),"undefined"!=typeof e.querySelectorAll&&!e.querySelectorAll(":scope fieldset div").length}),d.attributes=ce(function(e){return e.className="i",!e.getAttribute("className")}),d.getElementsByTagName=ce(function(e){return e.appendChild(C.createComment("")),!e.getElementsByTagName("*").length}),d.getElementsByClassName=K.test(C.getElementsByClassName),d.getById=ce(function(e){return a.appendChild(e).id=S,!C.getElementsByName||!C.getElementsByName(S).length}),d.getById?(b.filter.ID=function(e){var t=e.replace(te,ne);return function(e){return e.getAttribute("id")===t}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n=t.getElementById(e);return n?[n]:[]}}):(b.filter.ID=function(e){var n=e.replace(te,ne);return function(e){var t="undefined"!=typeof e.getAttributeNode&&e.getAttributeNode("id");return t&&t.value===n}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n,r,i,o=t.getElementById(e);if(o){if((n=o.getAttributeNode("id"))&&n.value===e)return[o];i=t.getElementsByName(e),r=0;while(o=i[r++])if((n=o.getAttributeNode("id"))&&n.value===e)return[o]}return[]}}),b.find.TAG=d.getElementsByTagName?function(e,t){return"undefined"!=typeof t.getElementsByTagName?t.getElementsByTagName(e):d.qsa?t.querySelectorAll(e):void 0}:function(e,t){var n,r=[],i=0,o=t.getElementsByTagName(e);if("*"===e){while(n=o[i++])1===n.nodeType&&r.push(n);return r}return o},b.find.CLASS=d.getElementsByClassName&&function(e,t){if("undefined"!=typeof t.getElementsByClassName&&E)return t.getElementsByClassName(e)},s=[],v=[],(d.qsa=K.test(C.querySelectorAll))&&(ce(function(e){var t;a.appendChild(e).innerHTML="",e.querySelectorAll("[msallowcapture^='']").length&&v.push("[*^$]="+M+"*(?:''|\"\")"),e.querySelectorAll("[selected]").length||v.push("\\["+M+"*(?:value|"+R+")"),e.querySelectorAll("[id~="+S+"-]").length||v.push("~="),(t=C.createElement("input")).setAttribute("name",""),e.appendChild(t),e.querySelectorAll("[name='']").length||v.push("\\["+M+"*name"+M+"*="+M+"*(?:''|\"\")"),e.querySelectorAll(":checked").length||v.push(":checked"),e.querySelectorAll("a#"+S+"+*").length||v.push(".#.+[+~]"),e.querySelectorAll("\\\f"),v.push("[\\r\\n\\f]")}),ce(function(e){e.innerHTML="";var t=C.createElement("input");t.setAttribute("type","hidden"),e.appendChild(t).setAttribute("name","D"),e.querySelectorAll("[name=d]").length&&v.push("name"+M+"*[*^$|!~]?="),2!==e.querySelectorAll(":enabled").length&&v.push(":enabled",":disabled"),a.appendChild(e).disabled=!0,2!==e.querySelectorAll(":disabled").length&&v.push(":enabled",":disabled"),e.querySelectorAll("*,:x"),v.push(",.*:")})),(d.matchesSelector=K.test(c=a.matches||a.webkitMatchesSelector||a.mozMatchesSelector||a.oMatchesSelector||a.msMatchesSelector))&&ce(function(e){d.disconnectedMatch=c.call(e,"*"),c.call(e,"[s!='']:x"),s.push("!=",F)}),v=v.length&&new RegExp(v.join("|")),s=s.length&&new RegExp(s.join("|")),t=K.test(a.compareDocumentPosition),y=t||K.test(a.contains)?function(e,t){var n=9===e.nodeType?e.documentElement:e,r=t&&t.parentNode;return e===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):e.compareDocumentPosition&&16&e.compareDocumentPosition(r)))}:function(e,t){if(t)while(t=t.parentNode)if(t===e)return!0;return!1},D=t?function(e,t){if(e===t)return l=!0,0;var n=!e.compareDocumentPosition-!t.compareDocumentPosition;return n||(1&(n=(e.ownerDocument||e)==(t.ownerDocument||t)?e.compareDocumentPosition(t):1)||!d.sortDetached&&t.compareDocumentPosition(e)===n?e==C||e.ownerDocument==p&&y(p,e)?-1:t==C||t.ownerDocument==p&&y(p,t)?1:u?P(u,e)-P(u,t):0:4&n?-1:1)}:function(e,t){if(e===t)return l=!0,0;var n,r=0,i=e.parentNode,o=t.parentNode,a=[e],s=[t];if(!i||!o)return e==C?-1:t==C?1:i?-1:o?1:u?P(u,e)-P(u,t):0;if(i===o)return pe(e,t);n=e;while(n=n.parentNode)a.unshift(n);n=t;while(n=n.parentNode)s.unshift(n);while(a[r]===s[r])r++;return r?pe(a[r],s[r]):a[r]==p?-1:s[r]==p?1:0}),C},se.matches=function(e,t){return se(e,null,null,t)},se.matchesSelector=function(e,t){if(T(e),d.matchesSelector&&E&&!N[t+" "]&&(!s||!s.test(t))&&(!v||!v.test(t)))try{var n=c.call(e,t);if(n||d.disconnectedMatch||e.document&&11!==e.document.nodeType)return n}catch(e){N(t,!0)}return 0":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(e){return e[1]=e[1].replace(te,ne),e[3]=(e[3]||e[4]||e[5]||"").replace(te,ne),"~="===e[2]&&(e[3]=" "+e[3]+" "),e.slice(0,4)},CHILD:function(e){return e[1]=e[1].toLowerCase(),"nth"===e[1].slice(0,3)?(e[3]||se.error(e[0]),e[4]=+(e[4]?e[5]+(e[6]||1):2*("even"===e[3]||"odd"===e[3])),e[5]=+(e[7]+e[8]||"odd"===e[3])):e[3]&&se.error(e[0]),e},PSEUDO:function(e){var t,n=!e[6]&&e[2];return G.CHILD.test(e[0])?null:(e[3]?e[2]=e[4]||e[5]||"":n&&X.test(n)&&(t=h(n,!0))&&(t=n.indexOf(")",n.length-t)-n.length)&&(e[0]=e[0].slice(0,t),e[2]=n.slice(0,t)),e.slice(0,3))}},filter:{TAG:function(e){var t=e.replace(te,ne).toLowerCase();return"*"===e?function(){return!0}:function(e){return e.nodeName&&e.nodeName.toLowerCase()===t}},CLASS:function(e){var t=m[e+" "];return t||(t=new RegExp("(^|"+M+")"+e+"("+M+"|$)"))&&m(e,function(e){return t.test("string"==typeof e.className&&e.className||"undefined"!=typeof e.getAttribute&&e.getAttribute("class")||"")})},ATTR:function(n,r,i){return function(e){var t=se.attr(e,n);return null==t?"!="===r:!r||(t+="","="===r?t===i:"!="===r?t!==i:"^="===r?i&&0===t.indexOf(i):"*="===r?i&&-1:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i;function D(e,n,r){return m(n)?S.grep(e,function(e,t){return!!n.call(e,t,e)!==r}):n.nodeType?S.grep(e,function(e){return e===n!==r}):"string"!=typeof n?S.grep(e,function(e){return-1)[^>]*|#([\w-]+))$/;(S.fn.init=function(e,t,n){var r,i;if(!e)return this;if(n=n||j,"string"==typeof e){if(!(r="<"===e[0]&&">"===e[e.length-1]&&3<=e.length?[null,e,null]:q.exec(e))||!r[1]&&t)return!t||t.jquery?(t||n).find(e):this.constructor(t).find(e);if(r[1]){if(t=t instanceof S?t[0]:t,S.merge(this,S.parseHTML(r[1],t&&t.nodeType?t.ownerDocument||t:E,!0)),N.test(r[1])&&S.isPlainObject(t))for(r in t)m(this[r])?this[r](t[r]):this.attr(r,t[r]);return this}return(i=E.getElementById(r[2]))&&(this[0]=i,this.length=1),this}return e.nodeType?(this[0]=e,this.length=1,this):m(e)?void 0!==n.ready?n.ready(e):e(S):S.makeArray(e,this)}).prototype=S.fn,j=S(E);var L=/^(?:parents|prev(?:Until|All))/,H={children:!0,contents:!0,next:!0,prev:!0};function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}S.fn.extend({has:function(e){var t=S(e,this),n=t.length;return this.filter(function(){for(var e=0;e\x20\t\r\n\f]*)/i,he=/^$|^module$|\/(?:java|ecma)script/i;ce=E.createDocumentFragment().appendChild(E.createElement("div")),(fe=E.createElement("input")).setAttribute("type","radio"),fe.setAttribute("checked","checked"),fe.setAttribute("name","t"),ce.appendChild(fe),y.checkClone=ce.cloneNode(!0).cloneNode(!0).lastChild.checked,ce.innerHTML="",y.noCloneChecked=!!ce.cloneNode(!0).lastChild.defaultValue,ce.innerHTML="",y.option=!!ce.lastChild;var ge={thead:[1,"","

"],col:[2,"","

"],tr:[2,"","

"],td:[3,"","

"],_default:[0,"",""]};function ve(e,t){var n;return n="undefined"!=typeof e.getElementsByTagName?e.getElementsByTagName(t||"*"):"undefined"!=typeof e.querySelectorAll?e.querySelectorAll(t||"*"):[],void 0===t||t&&A(e,t)?S.merge([e],n):n}function ye(e,t){for(var n=0,r=e.length;n",""]);var me=/<|&#?\w+;/;function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),p=[],d=0,h=e.length;d\s*$/g;function qe(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"tr")&&S(e).children("tbody")[0]||e}function Le(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}function He(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.slice(5):e.removeAttribute("type"),e}function Oe(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=Y.get(e).events))for(i in Y.remove(t,"handle events"),s)for(n=0,r=s[i].length;n").attr(n.scriptAttrs||{}).prop({charset:n.scriptCharset,src:n.url}).on("load error",i=function(e){r.remove(),i=null,e&&t("error"===e.type?404:200,e.type)}),E.head.appendChild(r[0])},abort:function(){i&&i()}}});var Ut,Xt=[],Vt=/(=)\?(?=&|$)|\?\?/;S.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var e=Xt.pop()||S.expando+"_"+Ct.guid++;return this[e]=!0,e}}),S.ajaxPrefilter("json jsonp",function(e,t,n){var r,i,o,a=!1!==e.jsonp&&(Vt.test(e.url)?"url":"string"==typeof e.data&&0===(e.contentType||"").indexOf("application/x-www-form-urlencoded")&&Vt.test(e.data)&&"data");if(a||"jsonp"===e.dataTypes[0])return r=e.jsonpCallback=m(e.jsonpCallback)?e.jsonpCallback():e.jsonpCallback,a?e[a]=e[a].replace(Vt,"$1"+r):!1!==e.jsonp&&(e.url+=(Et.test(e.url)?"&":"?")+e.jsonp+"="+r),e.converters["script json"]=function(){return o||S.error(r+" was not called"),o[0]},e.dataTypes[0]="json",i=C[r],C[r]=function(){o=arguments},n.always(function(){void 0===i?S(C).removeProp(r):C[r]=i,e[r]&&(e.jsonpCallback=t.jsonpCallback,Xt.push(r)),o&&m(i)&&i(o[0]),o=i=void 0}),"script"}),y.createHTMLDocument=((Ut=E.implementation.createHTMLDocument("").body).innerHTML="",2===Ut.childNodes.length),S.parseHTML=function(e,t,n){return"string"!=typeof e?[]:("boolean"==typeof t&&(n=t,t=!1),t||(y.createHTMLDocument?((r=(t=E.implementation.createHTMLDocument("")).createElement("base")).href=E.location.href,t.head.appendChild(r)):t=E),o=!n&&[],(i=N.exec(e))?[t.createElement(i[1])]:(i=xe([e],t,o),o&&o.length&&S(o).remove(),S.merge([],i.childNodes)));var r,i,o},S.fn.load=function(e,t,n){var r,i,o,a=this,s=e.indexOf(" ");return-1").append(S.parseHTML(e)).find(r):e)}).always(n&&function(e,t){a.each(function(){n.apply(this,o||[e.responseText,t,e])})}),this},S.expr.pseudos.animated=function(t){return S.grep(S.timers,function(e){return t===e.elem}).length},S.offset={setOffset:function(e,t,n){var r,i,o,a,s,u,l=S.css(e,"position"),c=S(e),f={};"static"===l&&(e.style.position="relative"),s=c.offset(),o=S.css(e,"top"),u=S.css(e,"left"),("absolute"===l||"fixed"===l)&&-1<(o+u).indexOf("auto")?(a=(r=c.position()).top,i=r.left):(a=parseFloat(o)||0,i=parseFloat(u)||0),m(t)&&(t=t.call(e,n,S.extend({},s))),null!=t.top&&(f.top=t.top-s.top+a),null!=t.left&&(f.left=t.left-s.left+i),"using"in t?t.using.call(e,f):("number"==typeof f.top&&(f.top+="px"),"number"==typeof f.left&&(f.left+="px"),c.css(f))}},S.fn.extend({offset:function(t){if(arguments.length)return void 0===t?this:this.each(function(e){S.offset.setOffset(this,t,e)});var e,n,r=this[0];return r?r.getClientRects().length?(e=r.getBoundingClientRect(),n=r.ownerDocument.defaultView,{top:e.top+n.pageYOffset,left:e.left+n.pageXOffset}):{top:0,left:0}:void 0},position:function(){if(this[0]){var e,t,n,r=this[0],i={top:0,left:0};if("fixed"===S.css(r,"position"))t=r.getBoundingClientRect();else{t=this.offset(),n=r.ownerDocument,e=r.offsetParent||n.documentElement;while(e&&(e===n.body||e===n.documentElement)&&"static"===S.css(e,"position"))e=e.parentNode;e&&e!==r&&1===e.nodeType&&((i=S(e).offset()).top+=S.css(e,"borderTopWidth",!0),i.left+=S.css(e,"borderLeftWidth",!0))}return{top:t.top-i.top-S.css(r,"marginTop",!0),left:t.left-i.left-S.css(r,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var e=this.offsetParent;while(e&&"static"===S.css(e,"position"))e=e.offsetParent;return e||re})}}),S.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(t,i){var o="pageYOffset"===i;S.fn[t]=function(e){return $(this,function(e,t,n){var r;if(x(e)?r=e:9===e.nodeType&&(r=e.defaultView),void 0===n)return r?r[i]:e[t];r?r.scrollTo(o?r.pageXOffset:n,o?n:r.pageYOffset):e[t]=n},t,e,arguments.length)}}),S.each(["top","left"],function(e,n){S.cssHooks[n]=$e(y.pixelPosition,function(e,t){if(t)return t=Be(e,n),Me.test(t)?S(e).position()[n]+"px":t})}),S.each({Height:"height",Width:"width"},function(a,s){S.each({padding:"inner"+a,content:s,"":"outer"+a},function(r,o){S.fn[o]=function(e,t){var n=arguments.length&&(r||"boolean"!=typeof e),i=r||(!0===e||!0===t?"margin":"border");return $(this,function(e,t,n){var r;return x(e)?0===o.indexOf("outer")?e["inner"+a]:e.document.documentElement["client"+a]:9===e.nodeType?(r=e.documentElement,Math.max(e.body["scroll"+a],r["scroll"+a],e.body["offset"+a],r["offset"+a],r["client"+a])):void 0===n?S.css(e,t,i):S.style(e,t,n,i)},s,n?e:void 0,n)}})}),S.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(e,t){S.fn[t]=function(e){return this.on(t,e)}}),S.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,"**"):this.off(t,e||"**",n)},hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),S.each("blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu".split(" "),function(e,n){S.fn[n]=function(e,t){return 0" ); 44 | $('#predicted-prompts').append( '

You : Predicted Responses

'); 45 | prompts.forEach((p)=>{ 46 | p = p.generated_text.replace(context, '').trim(); 47 | // console.log(p); 48 | 49 | //Select first complete message 50 | p = p.split('#')[0]; 51 | $('#predicted-prompts').append( "

"+p+"

" ); 52 | 53 | }) 54 | $('.prompt').on('mouseover',function(){ 55 | $('.prompt').css('background','none'); 56 | $(this).css('background','#083d53'); 57 | }) 58 | 59 | // $('.prompt')[0].trigger('mouseover'); 60 | const mouseoverEvent = new Event('mouseover'); 61 | document.querySelector('.prompt').dispatchEvent(mouseoverEvent); 62 | var currentSelectedPrompt = 0; 63 | var totalPrompts = prompts.length; 64 | 65 | $('.prompt').on('click',function(){ 66 | document.querySelector('body').removeEventListener('keydown',togglePrompt); 67 | 68 | var currentMessage = $('div[data-tab="6"]').text(); 69 | $('div[data-tab="6"]').text(''); 70 | currentMessage = currentMessage.trim() + ' ' + $(this).text(); 71 | $('div[data-tab="6"]').focus(); 72 | document.execCommand('insertText', false, currentMessage); 73 | $('#predicted-prompts').remove(); 74 | $('div[data-tab="6"]').siblings().hide(); 75 | }) 76 | 77 | function togglePrompt(e){ 78 | console.log($('#predicted-prompts').length); 79 | if($('#predicted-prompts').length){ 80 | e.preventDefault(); 81 | e.stopPropagation(); 82 | 83 | if (e.keyCode===38){ 84 | // console.log('up'); 85 | currentSelectedPrompt-=1; 86 | if (currentSelectedPrompt<0){ 87 | currentSelectedPrompt = totalPrompts-1; 88 | } 89 | console.log('CURRENT_PROMPT_NO(up)',currentSelectedPrompt); 90 | document.querySelectorAll('.prompt')[currentSelectedPrompt].dispatchEvent(mouseoverEvent); 91 | 92 | } 93 | else if(e.keyCode===40){ 94 | // console.log('down'); 95 | currentSelectedPrompt+=1; 96 | if (currentSelectedPrompt>=totalPrompts){ 97 | currentSelectedPrompt = 0; 98 | } 99 | console.log('CURRENT_PROMPT_NO(down)',currentSelectedPrompt); 100 | document.querySelectorAll('.prompt')[currentSelectedPrompt].dispatchEvent(mouseoverEvent); 101 | 102 | } 103 | else if(e.keyCode===27){ 104 | // console.log('escape'); 105 | $('#predicted-prompts').remove(); 106 | document.querySelector('body').removeEventListener('keydown',togglePrompt); 107 | console.log('CURRENT_PROMPT_NO(escape)',currentSelectedPrompt); 108 | 109 | var currentMessage = $('div[data-tab="6"]').text(); 110 | $('div[data-tab="6"]').text(''); 111 | $('div[data-tab="6"]').focus(); 112 | document.execCommand('insertText', false, currentMessage); 113 | 114 | } 115 | else if(e.keyCode===13){ 116 | // console.log('enter'); 117 | document.querySelector('body').removeEventListener('keydown',togglePrompt); 118 | console.log('CURRENT_PROMPT_NO(enter)',currentSelectedPrompt); 119 | document.querySelectorAll('.prompt')[currentSelectedPrompt].click(); 120 | } 121 | } 122 | } 123 | 124 | document.querySelector('body').addEventListener('keydown',togglePrompt); 125 | 126 | 127 | 128 | 129 | document.querySelector('[data-tab="7"]').scrollIntoView(false); 130 | } 131 | 132 | function getPrompts(context){ 133 | $.ajax({ 134 | url: 'http://localhost:5000/autocomplete', 135 | crossDomain: true, 136 | dataType: 'json', 137 | data: {context : context}, 138 | success: (d)=>{ 139 | // console.log(d); 140 | displayPrompts(d.outputs, context); 141 | } 142 | }); 143 | } 144 | 145 | 146 | $(document).ready(function(){ 147 | 148 | console.log("Hello, you!"); 149 | 150 | var icon = '

YOU

active

' 151 | $('body').append(icon) 152 | 153 | 154 | var tabListernerActive = false; 155 | var interval = setInterval(function(){ 156 | // document.body.style.border = '5px solid red'; 157 | if(tabListernerActive===false){ 158 | if ($('[data-tab="6"]').length>0){ 159 | console.log($('[data-tab="6"]'), 'yoyo') 160 | console.log('Adding Event Listeners for tabs') 161 | $('[data-tab="6"]').on('keydown',function(e){ 162 | if(e.keyCode===9){ 163 | e.stopPropagation(); 164 | e.preventDefault(); 165 | console.log('Tab!') 166 | $('[data-tab="6"]').blur(); 167 | var context = generateContext(); 168 | getPrompts(context); 169 | } 170 | }) 171 | 172 | $('[data-tab="6"]').on('keypress',function(e){ 173 | if(e.keyCode===9){ 174 | e.stopPropagation(); 175 | e.preventDefault(); 176 | } 177 | }) 178 | 179 | tabListernerActive = true; 180 | console.log('cleared'); 181 | clearInterval(interval); 182 | 183 | } 184 | } 185 | }, 1000); 186 | }); 187 | 188 | 189 | 190 | -------------------------------------------------------------------------------- /extension/icons48.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nuwandavek/you/9113fc3cabf2a7de7718938e93961faff7b7d8ee/extension/icons48.png -------------------------------------------------------------------------------- /extension/icons96.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nuwandavek/you/9113fc3cabf2a7de7718938e93961faff7b7d8ee/extension/icons96.png -------------------------------------------------------------------------------- /extension/manifest.json: -------------------------------------------------------------------------------- 1 | { 2 | 3 | "manifest_version": 2, 4 | "name": "You", 5 | "version": "1.0", 6 | 7 | "description": "Only You can complete your sentences :)", 8 | 9 | "icons": { 10 | "48": "icons48.png", 11 | "96": "icons96.png" 12 | }, 13 | "permissions": [ 14 | "activeTab", 15 | "webRequest", 16 | "", 17 | "http://localhost/*" 18 | ], 19 | "browser_action": { 20 | "default_icon": "icons48.png", 21 | "default_title": "You", 22 | "default_popup": "popup/popup.html" 23 | }, 24 | "content_scripts": [ 25 | { 26 | "matches": ["https://web.whatsapp.com/"], 27 | "js": ["content_scripts/jquery.min.js","content_scripts/you.js"] 28 | } 29 | ] 30 | 31 | } -------------------------------------------------------------------------------- /extension/popup/popup.css: -------------------------------------------------------------------------------- 1 | html, body { 2 | width: 300px; 3 | background : #262d31; 4 | font-family: 'Roboto'; 5 | color: #eee; 6 | text-align: center; 7 | 8 | } 9 | .jumbo{ 10 | font-size: 30px; 11 | font-weight: 900; 12 | } 13 | -------------------------------------------------------------------------------- /extension/popup/popup.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /notebooks/Untitled.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import transformers" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 2, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "tokenizer = transformers.GPT2Tokenizer.from_pretrained(\"distilgpt2\")" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 14, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "txt = '''Vivek is 😅'''\n", 28 | "\n" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 20, 34 | "metadata": {}, 35 | "outputs": [ 36 | { 37 | "data": { 38 | "text/plain": [ 39 | "[53, 425, 74, 318, 12520, 11805]" 40 | ] 41 | }, 42 | "execution_count": 20, 43 | "metadata": {}, 44 | "output_type": "execute_result" 45 | } 46 | ], 47 | "source": [ 48 | "a = tokenizer(txt)['input_ids']\n", 49 | "a" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 23, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "V\n", 62 | "ive\n", 63 | "k\n", 64 | " is\n", 65 | " �\n", 66 | "��\n" 67 | ] 68 | } 69 | ], 70 | "source": [ 71 | "for t in a:\n", 72 | " print(tokenizer.decode(t))" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 2, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "from transformers import pipeline" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 3, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "ft_generator = pipeline('text-generation', model='../output')" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 4, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stderr", 100 | "output_type": "stream", 101 | "text": [ 102 | "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" 103 | ] 104 | }, 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "[{'generated_text': 'Vivek: Yo icedup? 😏2/27/19, 3:38 PM - Rishi Amreeka: I saw the trailer with a guy with a kid!2/27/19, 3:38 PM -'},\n", 109 | " {'generated_text': \"Vivek: Yo ive klept4/8/20, 9:57 PM - Rishi Amreeka: Yeah the answer will be like, can you do that? And you can just call one who you've heard about\"},\n", 110 | " {'generated_text': 'Vivek: Yo ik, I only had 4 kids ik0:10 AM - Rishi Amreeka: Haha! :D 🤣🤣🤣🤣9/22/19, 1'}]" 111 | ] 112 | }, 113 | "execution_count": 4, 114 | "metadata": {}, 115 | "output_type": "execute_result" 116 | } 117 | ], 118 | "source": [ 119 | "ft_generator(\"Vivek: Yo \", max_length=50, num_return_sequences=3)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [] 128 | } 129 | ], 130 | "metadata": { 131 | "kernelspec": { 132 | "display_name": "Python 3", 133 | "language": "python", 134 | "name": "python3" 135 | }, 136 | "language_info": { 137 | "codemirror_mode": { 138 | "name": "ipython", 139 | "version": 3 140 | }, 141 | "file_extension": ".py", 142 | "mimetype": "text/x-python", 143 | "name": "python", 144 | "nbconvert_exporter": "python", 145 | "pygments_lexer": "ipython3", 146 | "version": "3.6.9" 147 | } 148 | }, 149 | "nbformat": 4, 150 | "nbformat_minor": 4 151 | } 152 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Flask 2 | flask-cors 3 | torch==1.6.0 4 | transformers==4.2.0 -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from flask import Flask, jsonify, request 4 | from flask_cors import CORS 5 | from transformers import pipeline 6 | 7 | app = Flask(__name__) 8 | CORS(app) 9 | 10 | def set_up_gen_pipeline(model_path): 11 | global gen_pipeline 12 | gen_pipeline = pipeline('text-generation', model=model_path, framework='pt') 13 | 14 | @app.route("/") 15 | def hello(): 16 | res = jsonify({ 17 | "hello": "world!" 18 | }) 19 | return res 20 | 21 | @app.route("/autocomplete") 22 | def prompt(): 23 | context = request.args.get('context', default = '', type = str) 24 | print(f'context = {context}') 25 | outputs = gen_pipeline(context, max_length=200, num_return_sequences=3, do_sample=True, eos_token_id=2, pad_token_id=0, 26 | skip_special_tokens=True, top_k=50, top_p=0.95) 27 | print(f'outputs = {outputs}') 28 | 29 | 30 | res = jsonify({ 31 | "outputs": outputs 32 | }) 33 | return res 34 | 35 | if __name__ == '__main__': 36 | if len(sys.argv) < 2: 37 | print("Missing required argument: model path.") 38 | exit(0) 39 | set_up_gen_pipeline(sys.argv[1]) 40 | app.run(host="localhost", port=5000, debug=True) 41 | --------------------------------------------------------------------------------