├── gen.png ├── cover.gif ├── noise2.jpg ├── gaussian_noise.png ├── README.md └── AI_Generated_Characters.ipynb /gen.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/HEAD/gen.png -------------------------------------------------------------------------------- /cover.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/HEAD/cover.gif -------------------------------------------------------------------------------- /noise2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/HEAD/noise2.jpg -------------------------------------------------------------------------------- /gaussian_noise.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/HEAD/gaussian_noise.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AI-generated-characters for Learning and Wellbeing 2 | Click [here](https://www.media.mit.edu/projects/Ai-generated-characters/overview/) for the full project page. 3 | 4 | This repository contains the source code for the paper [AI-generated characters for supporting personalized learning and well-being](https://www.nature.com/articles/s42256-021-00417-9) by Pat Pataranutaporn, Valdemar Danry, Joanne Leong, Parinya Punpongsanon, Dan Novy, Pattie Maes & Misha Sra. This repository is a combination previous work on AI generated characters that include [Siarohin et al.](https://github.com/AliaksandrSiarohin/first-order-model), [Prajwal et al.](https://github.com/Rudrabha/Wav2Lip), and [Corentin](https://github.com/CorentinJ/Real-Time-Voice-Cloning). 5 | 6 | ## Colab Demo 7 | The code is available on google-colab. See: ```AI_Generated_Characters.ipynb```. To run press ```Open In Colab``` button. 8 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1y0YigI1RiTVd2Qr6HHpesAwYoovcvZaE?usp=sharing) 9 | 10 | ## Examples of Outputs 11 | ![Screenshot](cover.gif) 12 | 13 | With the pipeline, one can easily create a video of AI-generated characters from Video and Audio. 14 | 15 | 16 | -------------------------------------------------------------------------------- /AI_Generated_Characters.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "bKvX_88sNgXs" 7 | }, 8 | "source": [ 9 | "# AI Generated Characters for Learning and Wellbeing" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "nO7671Y9oXiW" 16 | }, 17 | "source": [ 18 | "Website: https://www.media.mit.edu/projects/ai-generated-characters/overview/\n", 19 | "\n", 20 | "Paper: https://www.nature.com/articles/s42256-021-00417-9\n", 21 | "\n", 22 | "Github: https://github.com/mitmedialab/AI-generated-characters\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "source": [ 28 | "![](https://drive.google.com/uc?export=view&id=17arRYqt6QyEjkj4-5eDrqRPcteTsbheO)\n" 29 | ], 30 | "metadata": { 31 | "id": "9M320pz78nl7" 32 | } 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": { 37 | "id": "2XSW0Yc6Nq-F" 38 | }, 39 | "source": [ 40 | "*This notebook is a combination of previous work on AI generated characters compiled into one easy to use pipeline that include [Siarohin et al.](https://github.com/AliaksandrSiarohin/first-order-model), [Prajwal et al.](https://github.com/Rudrabha/Wav2Lip), and [Corentin](https://github.com/CorentinJ/Real-Time-Voice-Cloning). Please go check out their amazing work.*" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": { 46 | "id": "tbvav0P_NNqj" 47 | }, 48 | "source": [ 49 | "**Licensed under the MIT License**\n", 50 | "\n", 51 | "\n", 52 | "Copyright (c) 2021 MIT Media Lab\n", 53 | "\n", 54 | "Permission is hereby granted, free of charge, to any person obtaining a copy\n", 55 | "of this software and associated documentation files (the \"Software\"), to deal\n", 56 | "in the Software without restriction, including without limitation the rights\n", 57 | "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n", 58 | "copies of the Software, and to permit persons to whom the Software is\n", 59 | "furnished to do so, subject to the following conditions:\n", 60 | "The above copyright notice and this permission notice shall be included in\n", 61 | "all copies or substantial portions of the Software.\n", 62 | "\n", 63 | "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n", 64 | "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n", 65 | "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n", 66 | "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n", 67 | "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n", 68 | "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n", 69 | "THE SOFTWARE." 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": { 76 | "id": "t-npGOwrGNhs", 77 | "cellView": "form" 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "#@markdown #**Installation of libraries**\n", 82 | "# @markdown This cell will take a little while because it has to download several libraries.\n", 83 | "%cd \"/content\" \n", 84 | "import requests\n", 85 | "\n", 86 | "print(\"Downloading Packages\")\n", 87 | "# Character Images\n", 88 | "!gdown --id \"16HzQKA4e3vpLY8Em57WnE8UwIE591aF1\" -O \"/content/mona_lisa.png\" &> /dev/null\n", 89 | "!gdown --id \"1cgfFgzm4BrqKIkyspGib6u4ty5ReyeM_\" -O \"/content/einstein.png\" &> /dev/null\n", 90 | "!gdown --id \"10N3e5E0R1aYcLVmE_dmtMCSYVFGQLTeq\" -O \"/content/lincoln.png\" &> /dev/null\n", 91 | "!gdown --id \"1-BeSNGGjJADs5W-Rn6izAteuVzJcnhW1\" -O \"/content/nietzsche.png\" &> /dev/null\n", 92 | "!gdown --id \"1zPPUQ7xgbhnpVNl26J1Gl6rXlJ6g0rK7\" -O \"/content/sokrates.png\" &> /dev/null\n", 93 | "!gdown --id \"1mzzEdXEOohLcpr8L01JzOVbirEMJogni\" -O \"/content/van_gogh.png\" &> /dev/null\n", 94 | "\n", 95 | "# Face Cropping\n", 96 | "!wget \"https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_alt2.xml\" -O \"/content/haarcascade_frontalface_alt2.xml\" &> /dev/null\n", 97 | "\n", 98 | "# Wav2Lip\n", 99 | "!git clone \"https://github.com/Rudrabha/Wav2Lip.git\"\n", 100 | "!wget \"https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth\" -O \"Wav2Lip/face_detection/detection/sfd/s3fd.pth\" &> /dev/null\n", 101 | "!gdown --id \"1IKhxXy0mplOpGFWLH9_uUhBoIplao8j0\" -O \"/content/Wav2Lip/checkpoints/wav2lip_gan.pth\" &> /dev/null\n", 102 | "\n", 103 | "# First-Order-Model\n", 104 | "!git clone \"https://github.com/AliaksandrSiarohin/first-order-model\"\n", 105 | "!gdown --id \"19d9ZJYAMsNNQZd4AzIWCw4sF1EaNYuJ3\" -O \"/content/first-order-model/vox-cpk.pth.tar\" &> /dev/null\n", 106 | "\n", 107 | "# Template Data\n", 108 | "#!gdown --id \"1Qod7I5hiK1nCPsHBqAdK6hoYZgNzQPHi\" -O \"driving_video_long.mp4\"\n", 109 | "!gdown --id \"1o2zD5xky8F6wZ21PkeG5KhJOlSdkeEpm\" -O \"driving_video.mp4\" &> /dev/null\n", 110 | "\n", 111 | "# Watermark\n", 112 | "url = 'https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/main/gen.png'\n", 113 | "r = requests.get(url, allow_redirects=True) \n", 114 | "open('gen.png', 'wb').write(r.content)\n", 115 | "\n", 116 | "# Noise\n", 117 | "url = 'https://raw.githubusercontent.com/mitmedialab/AI-generated-characters/main/noise2.jpg'\n", 118 | "r = requests.get(url, allow_redirects=True)\n", 119 | "open('noise_2.png', 'wb').write(r.content)\n", 120 | "\n", 121 | "\n", 122 | "print(\"Installing required libraries\")\n", 123 | "!pip install -r Wav2Lip/requirements.txt -y &> /dev/null\n", 124 | "!pip uninstall tensorflow tensorflow-gpu -y &> /dev/null\n", 125 | "!pip install ffmpeg -y &> /dev/null\n", 126 | "!pip install https://github.com/tugstugi/dl-colab-notebooks/archive/colab_utils.zip &> /dev/null\n", 127 | "\n", 128 | "\n", 129 | "# General Functions\n", 130 | "print(\"Loading Libraries and functions\")\n", 131 | "import sys\n", 132 | "import numpy as np\n", 133 | "import ipywidgets as widgets\n", 134 | "from io import StringIO\n", 135 | "from IPython import get_ipython\n", 136 | "from IPython.display import display, Audio, clear_output\n", 137 | "from dl_colab_notebooks.audio import record_audio, upload_audio\n", 138 | "from scipy.io import wavfile\n", 139 | "\n", 140 | "class IpyExit(SystemExit):\n", 141 | " \"\"\"\n", 142 | " Exit Exception for IPython.\n", 143 | " Exception temporarily redirects stderr to buffer.\n", 144 | " \"\"\"\n", 145 | " def __init__(self):\n", 146 | " print(\"Error: Please only select one input. If you will not use text please leave text field empty.\")\n", 147 | " sys.stderr = StringIO()\n", 148 | "\n", 149 | " def __del__(self):\n", 150 | " sys.stderr.close()\n", 151 | " sys.stderr = sys.__stderr__ # restore from backup\n", 152 | "\n", 153 | "from google.colab import files\n", 154 | "def getLocalFiles():\n", 155 | " uploaded = files.upload()\n", 156 | " filename = next(iter(uploaded))\n", 157 | " return filename\n", 158 | "\n", 159 | "\n", 160 | "# First-order-model\n", 161 | "import imageio\n", 162 | "import cv2\n", 163 | "import numpy as np\n", 164 | "import matplotlib.pyplot as plt\n", 165 | "import matplotlib.animation as animation\n", 166 | "from skimage.transform import resize\n", 167 | "from IPython.display import HTML\n", 168 | "import warnings\n", 169 | "warnings.filterwarnings(\"ignore\")\n", 170 | "\n", 171 | "def _compute_embedding(audio):\n", 172 | " display(Audio(audio, rate=SAMPLE_RATE, autoplay=True))\n", 173 | " global embedding\n", 174 | " embedding = None\n", 175 | " embedding = encoder.embed_utterance(encoder.preprocess_wav(audio, SAMPLE_RATE))\n", 176 | "\n", 177 | "def _record_audio(b):\n", 178 | " clear_output()\n", 179 | " audio = record_audio(record_seconds, sample_rate=SAMPLE_RATE)\n", 180 | " #_compute_embedding(audio)\n", 181 | " display(Audio(audio, rate=SAMPLE_RATE, autoplay=True))\n", 182 | " wavfile.write('driving_audio.wav', SAMPLE_RATE, (32767*audio).astype(np.int16))\n", 183 | "\n", 184 | "def _upload_audio(b):\n", 185 | " clear_output()\n", 186 | " audio = upload_audio(sample_rate=SAMPLE_RATE)\n", 187 | " _compute_embedding(audio)\n", 188 | "\n", 189 | "def trim_img(img_src):\n", 190 | " \n", 191 | " import imutils\n", 192 | "\n", 193 | " # Read the Input Image\n", 194 | " img = cv2.imread(img_src)\n", 195 | " img = imutils.resize(img, width=400) \n", 196 | "\n", 197 | " # Convert into grayscale\n", 198 | " gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)\n", 199 | "\n", 200 | " # Trim to 400x400\n", 201 | " face_cascade = cv2.CascadeClassifier('/content/haarcascade_frontalface_alt2.xml')\n", 202 | " faces = face_cascade.detectMultiScale(gray, 1.1, 4)\n", 203 | " try:\n", 204 | " for (x, y, w, h) in faces:\n", 205 | " extention = 40\n", 206 | " faces = img[y-extention:y + h+extention, x-extention:x + w + extention]\n", 207 | " cv2.imwrite('/content/img_trimmed.png', faces)\n", 208 | " except:\n", 209 | " print(\"Error: Face takes too much space on image. Try a different image, or trim it yourself to 400x400.\")\n", 210 | "\n", 211 | " return \"/content/img_trimmed.png\"\n", 212 | "\n", 213 | "\n", 214 | "def animate_video(img_filename, vid_filename):\n", 215 | " %cd /content/first-order-model/\n", 216 | " \n", 217 | " from demo import make_animation\n", 218 | " from demo import load_checkpoints\n", 219 | " from skimage import img_as_ubyte\n", 220 | "\n", 221 | " source_image = imageio.imread(img_filename)\n", 222 | " driving_video = imageio.mimread(vid_filename, fps=30, memtest=False) \n", 223 | "\n", 224 | " # Resize image and video to 256x256\n", 225 | " source_image = resize(source_image, (256, 256))[..., :3]\n", 226 | " driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]\n", 227 | "\n", 228 | " # Load Model\n", 229 | " generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', checkpoint_path='/content/first-order-model/vox-cpk.pth.tar')\n", 230 | "\n", 231 | " # Make Animation\n", 232 | " predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True,\n", 233 | " adapt_movement_scale=False)\n", 234 | " #save resulting video\n", 235 | " imageio.mimsave('/content/vidvid.mp4', [img_as_ubyte(frame) for frame in predictions], fps=30)\n", 236 | "\n", 237 | " %cd /content\n", 238 | "\n", 239 | "\n", 240 | "def tracability(video_filename):\n", 241 | " import moviepy.editor as mp\n", 242 | "\n", 243 | " video = mp.VideoFileClip(video_filename)\n", 244 | "\n", 245 | " machine = (mp.ImageClip('/content/noise_2.png')\n", 246 | " .set_duration(video.duration)\n", 247 | " .set_opacity(.05)\n", 248 | " .resize(height = 552) #\n", 249 | " .margin(right = 0, top = 0, opacity = 1.0)\n", 250 | " .set_pos((\"center\", \"center\")))\n", 251 | " \n", 252 | " human = (mp.ImageClip('/content/gen.png')\n", 253 | " .set_duration(video.duration)\n", 254 | " .resize(height = 50) #\n", 255 | " .margin(right = 0, top = 0, opacity = 1.0)\n", 256 | " .set_pos((\"left\", \"bottom\")))\n", 257 | "\n", 258 | " final = mp.CompositeVideoClip([video, machine, human])\n", 259 | " final.write_videofile(\"/content/marked.mp4\")\n", 260 | "\n", 261 | "print(\"Succesfully Finished Installing Libraries\")" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": { 268 | "id": "rEvjGeswFb1Z", 269 | "cellView": "form" 270 | }, 271 | "outputs": [], 272 | "source": [ 273 | "#@markdown #**Choose Character**\n", 274 | "\n", 275 | "# TO DO: Show Images of Characters one can choose.\n", 276 | "\n", 277 | "# @markdown Choose the character which you want to animate. If you have any requests for new characters to animate, please let us know here: patpat@mit.edu\n", 278 | "character = 'Mona Lisa' #@param [\"Van Gogh\", \"Mona Lisa\", \"Einstein\", \"Lincoln\", \"Nietzsche\", \"Sokrates\", \"Upload Your Own\"]\n", 279 | "print(f\"{character} selected.\")\n", 280 | "\n", 281 | "if character == \"Upload Your Own\":\n", 282 | " character_img = \"/content/\"+getLocalFiles()\n", 283 | " if cv2.imread(character_img).shape[0] != cv2.imread(character_img).shape[1]:\n", 284 | " print(\"Cropping uploaded image\")\n", 285 | " character_img = trim_img(character_img)\n", 286 | "\n", 287 | "else:\n", 288 | " character = character.lower().replace(\" \", \"_\") # make lowercase and remove spacing\n", 289 | " character_img = \"/content/\"+character+\".png\"" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": { 296 | "id": "xGIhE54sFPXG", 297 | "cellView": "form" 298 | }, 299 | "outputs": [], 300 | "source": [ 301 | "#@markdown #**Choose Inputs**\n", 302 | "# @markdown Please select one of the available inputs. Leave the text field empty if you want to animate the character with audio or video.\n", 303 | "\n", 304 | "\n", 305 | "#Welcome. Today we will learn about the Theory of Relativity. I first came up with this method when...\n", 306 | "text = \"\" #@param {type:\"string\"}\n", 307 | "#@markdown --\n", 308 | "audio = True #@param {type:\"boolean\"}\n", 309 | "#@markdown * Either record audio from microphone or upload audio from file (.mp3 or .wav) \n", 310 | "record_or_upload = \"Record\" #@param [\"Record\", \"Upload (.mp3 or .wav)\"]\n", 311 | "record_seconds = 5#@param {type:\"number\", min:1, max:10, step:1}\n", 312 | "#@markdown --\n", 313 | "video = False #@param {type:\"boolean\"}\n", 314 | "\n", 315 | "if text != \"\" and audio or text !=\"\" and video or audio and video:\n", 316 | " raise IpyExit\n", 317 | "\n", 318 | "\n", 319 | "if video:\n", 320 | " print(\"Please upload the video you wish to drive the animation with:\\n\")\n", 321 | " video_driver = \"/content/\"+getLocalFiles()\n", 322 | "\n", 323 | " #to do: make sure only supported video formats can be uploaded\n", 324 | "\n", 325 | "elif audio:\n", 326 | "\n", 327 | " SAMPLE_RATE = 22050\n", 328 | " embedding = None\n", 329 | "\n", 330 | " if record_or_upload == \"Record\":\n", 331 | " print(\"Please record the audio you wish to drive the animation with. Remember to enable your microphone in Chrome:\\n\")\n", 332 | " button = widgets.Button(description=\"Record Your Voice\")\n", 333 | " button.on_click(_record_audio) \n", 334 | " display(button)\n", 335 | " audio_driver = \"/content/driving_audio.wav\"\n", 336 | " else:\n", 337 | " print(\"Please upload the audio you wish to drive the animation with:\\n\")\n", 338 | " audio_driver = \"/content/\"+getLocalFiles()\n", 339 | " video_driver = \"/content/driving_video.mp4\"\n", 340 | "\n", 341 | "elif text:\n", 342 | " print(\"Text is currently unsupported but will be soon.. Please use either audio or video inputs for now.\")" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": { 349 | "id": "Orsq_D2RLvo2", 350 | "cellView": "form" 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "from numpy.core import memmap\n", 355 | "import shutil\n", 356 | "\n", 357 | "\n", 358 | "#@markdown #**Generate Character**\n", 359 | "#@markdown This is likely to take a while depending on the length of your driving video. First we generate the movements of the character using the first-order-model approach, and then, if audio or text was given as input, we either synthesize audio from or use the audio provided to make the character lipsymc it using Wav2Lip.\n", 360 | "!cd /content/\n", 361 | "print(\"Animating Character with Driving Video: This might take a few minutes..\")\n", 362 | "animate_video(character_img, video_driver) # variables are only for showing HTML video\n", 363 | "final_video_driver = \"/content/vidvid.mp4\"\n", 364 | "\n", 365 | "if text != \"\":\n", 366 | " print(\"Generating speech from text\")\n", 367 | " # generate audio\n", 368 | " #audio_driver = _GENERATED AUDIO.wav_\n", 369 | " audio = True\n", 370 | "\n", 371 | "if audio:\n", 372 | " print(\"Lipsyncing Character with Audio\")\n", 373 | " # Using Wav2Lip\n", 374 | " %cd /content/Wav2Lip\n", 375 | " !python inference.py --checkpoint_path \"/content/Wav2Lip/checkpoints/wav2lip_gan.pth\" --face $final_video_driver --audio $audio_driver &> /dev/null\n", 376 | " %cd /content\n", 377 | " final_video_driver = \"/content/Wav2Lip/results/result_voice.mp4\"\n", 378 | "else:\n", 379 | " audio_driver = \"/content/driver.wav\"\n", 380 | " !ffmpeg -i $video_driver -q:a 0 -map 0:a \"/content/driver.wav\" -y &> /dev/null\n", 381 | " !ffmpeg -i $final_video_driver -i $audio_driver -c:v copy -c:a aac merged.mp4 -y &> /dev/null\n", 382 | " final_video_driver = \"merged.mp4\"\n", 383 | "\n", 384 | "# Traceability\n", 385 | "tracability(final_video_driver)\n", 386 | "final_video_driver = \"marked.mp4\"\n", 387 | "!ffmpeg -i $final_video_driver -i $audio_driver final_generated.mp4 -y &> /dev/null\n", 388 | "!ffmpeg -i $final_video_driver ai_generated_character.mp4 -y &> /dev/null\n", 389 | "final_video_driver = \"ai_generated_character.mp4\"\n", 390 | "\n", 391 | "# display result\n", 392 | "from IPython.display import HTML\n", 393 | "from base64 import b64encode\n", 394 | "mp4 = open(\"/content/final_generated.mp4\",'rb').read()\n", 395 | "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", 396 | "HTML(\"\"\"\n", 397 | "\n", 400 | "\"\"\" % data_url)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "source": [ 406 | "#@markdown ### **Download the Generated Video**\n", 407 | "#@markdown Run this cell to download your generated video. If you wish to change your AI-generated character or the input, please go back to that cell and repeat the same process. You can skip the **Installation of libraries** section.\n", 408 | "\n", 409 | "from google.colab import files\n", 410 | "files.download(final_video_driver)" 411 | ], 412 | "metadata": { 413 | "cellView": "form", 414 | "id": "vSRqXk10zCDw" 415 | }, 416 | "execution_count": null, 417 | "outputs": [] 418 | } 419 | ], 420 | "metadata": { 421 | "accelerator": "GPU", 422 | "colab": { 423 | "collapsed_sections": [], 424 | "machine_shape": "hm", 425 | "name": "AI_Generated_Characters.ipynb", 426 | "provenance": [] 427 | }, 428 | "kernelspec": { 429 | "display_name": "Python 3", 430 | "name": "python3" 431 | } 432 | }, 433 | "nbformat": 4, 434 | "nbformat_minor": 0 435 | } --------------------------------------------------------------------------------