├── README.md ├── analysis.ipynb ├── build_dataset.ipynb ├── emoji_demo ├── .DS_Store ├── AppDelegate.h ├── AppDelegate.mm ├── RunModel-Info.plist ├── RunModelViewController.h ├── RunModelViewController.mm ├── RunModelViewController.xib ├── data │ ├── .DS_Store │ └── emoji_frozen.pb ├── ios_image_load.h ├── ios_image_load.mm ├── main.mm └── tf_ios_makefile_example.xcodeproj │ ├── project.pbxproj │ ├── project.xcworkspace │ ├── contents.xcworkspacedata │ └── xcuserdata │ │ └── h4x.xcuserdatad │ │ └── UserInterfaceState.xcuserstate │ └── xcuserdata │ └── h4x.xcuserdatad │ ├── xcdebugger │ └── Breakpoints_v2.xcbkptlist │ └── xcschemes │ ├── tf_ios_makefile_example.xcscheme │ └── xcschememanagement.plist ├── export_tf_model.ipynb ├── extract.py ├── extract_all.sh ├── p5-40-test.hdf5 ├── replayer.ipynb ├── stats_top.py ├── tokenize_dataset.ipynb └── train.py /README.md: -------------------------------------------------------------------------------- 1 | # Emoji TensorFlow-iOS 2 | 3 | This is a TensorFlow demo that can be run on iOS. It implements a text classifier 4 | that can predict emoji from short text (like tweets). 5 | 6 | Presentation: `TensorFlow on iOS.pdf` 7 | 8 | ## License 9 | 10 | [![License: CC BY-NC 4.0](https://licensebuttons.net/l/by-nc/4.0/80x15.png)](http://creativecommons.org/licenses/by-nc/4.0/) 11 | 12 | 13 | ## Prerequests 14 | 15 | * [TensorFlow (installation & source code)](https://www.tensorflow.org) 16 | * [Keras](https://keras.io) 17 | * Python 3 18 | * Xcode 19 | * Jupyter (for viewing ipynb files) 20 | * [Twitter data](https://archive.org/search.php?query=twitterstream) 21 | 22 | ## How to train the model? 23 | 24 | I have included a pretrained Keras model in this repository (p5-40-test.hdf5) that 25 | you can play with. But in case you want to train it by yourself, here is a brief 26 | guide. 27 | 28 | 1. Prepare training data. 29 | 1. Downlaod and unzip twitter training data. 30 | 2. Modify the `$INPUT` directory in `extract_all.sh` and run the script. Then you will get `data/extracted.list`. 31 | 3. Run `stats_top.py` to get the top emojis stored at `data/stat.txt`. 32 | 4. Open Jupyter notebook and run `build_dataset.ipynb`. It produces `data/dataset.pickle` with all sampled training data. 33 | 5. Run `tokenize_dataset.ipynb` to produce the tokenized dataset `data/plain_dataset.pickle` as well as metadata `data/plain_dataset_meta.pickle`. 34 | 2. Run `train.py` to train the model. High end Cuda GPUs are recommended. The trained Keras model will be saved as `p5-40-test.hdf5`. 35 | 3. (Optional) You can try the model on arbitrary input with `replayer.ipynb`. 36 | 4. (Optional) You can also try to visualize the training process by `tensorboard --log_dir=.` if you have trained a model. 37 | 38 | ## How to compile TensorFlow for iOS? 39 | 40 | Follow the official compile guide [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ios_examples). 41 | 42 | ## How to run the model on iOS? 43 | 44 | Unfortunately, TensorFlow for iOS is still an alpha version. So we have to tweak a 45 | little bit to make it work. 46 | 47 | ### Add additional integer Ops for LSTM. 48 | 49 | Navigate to `tensorflow/core/kernels` directory and change the code like this: 50 | 51 | At `cwise_op_add_1.cc`: 52 | 53 | // -- Original 54 | REGISTER5(BinaryOp, CPU, "Add", functor::add, float, Eigen::half, double, int32, 55 | int64); 56 | #if TENSORFLOW_USE_SYCL 57 | 58 | // -- Change to 59 | REGISTER5(BinaryOp, CPU, "Add", functor::add, float, Eigen::half, double, int32, 60 | int64); 61 | #if defined(__ANDROID_TYPES_SLIM__) 62 | REGISTER(BinaryOp, CPU, "Add", functor::add, int32); 63 | #endif // __ANDROID_TYPES_SLIM__ 64 | #if TENSORFLOW_USE_SYCL 65 | 66 | At `cwise_op_less.cc`: 67 | 68 | // -- Original 69 | REGISTER8(BinaryOp, CPU, "Less", functor::less, float, Eigen::half, double, 70 | int32, int64, uint8, int8, int16); 71 | #if GOOGLE_CUDA 72 | 73 | // -- Change to 74 | REGISTER8(BinaryOp, CPU, "Less", functor::less, float, Eigen::half, double, 75 | int32, int64, uint8, int8, int16); 76 | #if defined(__ANDROID_TYPES_SLIM__) 77 | REGISTER(BinaryOp, CPU, "Less", functor::less, int32); 78 | #endif // __ANDROID_TYPES_SLIM__ 79 | #if GOOGLE_CUDA 80 | 81 | Then compile TensorFlow again and you won't encounter "No OpsKernel found" issue. 82 | 83 | ### Convert the Keras model to TensorFlow model. 84 | 85 | Run `export_tf_model.ipynb` to convert Keras model file `p5-40-test.hdf5` to TensorFlow 86 | model: 87 | 88 | * GraphDef: `export/p5-40-test-serving/graph-serving.pb` 89 | * Checkpoint: `export/p5-40-test-serving/model-ckpt-*` 90 | 91 | Navigate to `export/p5-40-test-serving` directory and run the following command to 92 | convert the model to mobile version: 93 | python3 -m tensorflow.python.tools.freeze_graph \ 94 | --input_graph="graph-serving.pb" --input_checkpoint="model-ckpt" \ 95 | --output_graph="frozen.pb" --output_node_names="dense_2/Softmax" 96 | 97 | Finally you will get `forzen.pb` file which will be used later. 98 | 99 | ### Run the model on iOS. 100 | 101 | Copy the Xcode project `emoji_demo` to `tensorflow/tensorflow/contrib/ios_examples`. 102 | You should be able to compile and run it on iOS now. The demo itself includes a 103 | pretrained model at `data/forzen.pb`. To run your own model, you have to replace it 104 | with yours. 105 | 106 | -------------------------------------------------------------------------------- /analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "from matplotlib import pyplot as plt\n", 12 | "import pickle" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 2, 18 | "metadata": { 19 | "collapsed": true 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "emojis = []\n", 24 | "counts = []\n", 25 | "with open('data/stat.txt') as fin:\n", 26 | " for line in fin:\n", 27 | " emoji = line[0]\n", 28 | " cnt = int(line[2:])\n", 29 | " emojis.append(emoji)\n", 30 | " counts.append(cnt)" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 3, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "data": { 40 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAD8CAYAAACyyUlaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF4tJREFUeJzt3X+wX3Wd3/Hnq0Go4y+C3DIMgQY1uzvIbCOkSGfVodKF\ngDsGO6yF2ZGspUYrzKyz7ayhO1Msage349K6ozhYUoJVfizoklljMUVmnf4BcpEIQWW5YBiSiSRL\nELZlFwXe/eP7ufrleu9Ncj+X+70hz8fMme/5vs/nfM7nnpmbV87nnO/3pqqQJKnHPxj1ACRJBz/D\nRJLUzTCRJHUzTCRJ3QwTSVI3w0SS1M0wkSR1M0wkSd0ME0lSt8NGPYCFcvTRR9fy5ctHPQxJOqjc\ne++9f1NVY/tqd8iEyfLlyxkfHx/1MCTpoJLksf1p5zSXJKmbYSJJ6maYSJK67TNMkmxIsjvJtqHa\nTUm2tmV7kq2tvjzJ3w1t++LQPqcmeSDJRJLPJUmrH5VkS5KH2+vSVk9rN5Hk/iSnDPW1trV/OMna\n+TwhkqQDtz9XJtcBq4cLVfWvqmplVa0EbgW+NrT5kcltVfWRofrVwIeAFW2Z7HM9cEdVrQDuaO8B\nzhlqu67tT5KjgMuBtwOnAZdPBpAkaTT2GSZV9R1g73Tb2tXF+4EbZusjybHA66vqrhr8Na7rgfPa\n5jXAxra+cUr9+hq4Cziy9XM2sKWq9lbVU8AWpoSdJGlh9d4zeSfwRFU9PFQ7Mcl9Sf4qyTtb7Thg\nx1CbHa0GcExV7WrrPwGOGdrn8Wn2makuSRqR3s+ZXMhLr0p2ASdU1ZNJTgX+Islb97ezqqok8/Z3\nhJOsYzBFxgknnDBf3UqSppjzlUmSw4B/Cdw0Wauq56rqybZ+L/AI8GvATmDZ0O7LWg3giTZ9NTkd\ntrvVdwLHT7PPTPVfUVXXVNWqqlo1NrbPD3BKkuao58rkXwA/qqpfTF8lGQP2VtULSd7E4Ob5o1W1\nN8kzSU4H7gYuAv6s7bYJWAtc2V5vG6pfmuRGBjfbn66qXUluB/7z0E33s4DLOn6O/bJ8/TemrW+/\n8j0v96EladHbZ5gkuQE4Azg6yQ7g8qq6FriAX73x/i7giiQ/B14EPlJVkzfvP8rgybBXA99sCwxC\n5OYkFwOPMbihD7AZOBeYAJ4FPgjQgumTwD2t3RVDx5AkjcA+w6SqLpyh/vvT1G5l8KjwdO3HgZOn\nqT8JnDlNvYBLZuhrA7BhtnFLkhaOn4CXJHUzTCRJ3QwTSVI3w0SS1M0wkSR1M0wkSd0ME0lSN8NE\nktTNMJEkdTNMJEndDBNJUjfDRJLUzTCRJHUzTCRJ3QwTSVI3w0SS1M0wkSR1M0wkSd0ME0lSN8NE\nktTNMJEkddtnmCTZkGR3km1DtU8k2Zlka1vOHdp2WZKJJA8lOXuovrrVJpKsH6qfmOTuVr8pyeGt\nfkR7P9G2L9/XMSRJo7E/VybXAaunqV9VVSvbshkgyUnABcBb2z5fSLIkyRLg88A5wEnAha0twGda\nX28BngIubvWLgada/arWbsZjHNiPLUmaT/sMk6r6DrB3P/tbA9xYVc9V1Y+BCeC0tkxU1aNV9TPg\nRmBNkgDvBm5p+28Ezhvqa2NbvwU4s7Wf6RiSpBHpuWdyaZL72zTY0lY7Dnh8qM2OVpup/kbgp1X1\n/JT6S/pq259u7WfqS5I0InMNk6uBNwMrgV3AZ+dtRPMoybok40nG9+zZM+rhSNIr1pzCpKqeqKoX\nqupF4Ev8cpppJ3D8UNNlrTZT/UngyCSHTam/pK+2/Q2t/Ux9TTfOa6pqVVWtGhsbm8uPKknaD3MK\nkyTHDr19HzD5pNcm4IL2JNaJwArgu8A9wIr25NbhDG6gb6qqAu4Ezm/7rwVuG+prbVs/H/h2az/T\nMSRJI3LYvhokuQE4Azg6yQ7gcuCMJCuBArYDHwaoqgeT3Az8AHgeuKSqXmj9XArcDiwBNlTVg+0Q\nHwduTPIp4D7g2la/FvhykgkGDwBcsK9jSJJGI4P/7L/yrVq1qsbHx+e8//L135i2vv3K98y5T0la\n7JLcW1Wr9tXOT8BLkroZJpKkboaJJKmbYSJJ6maYSJK6GSaSpG6GiSSpm2EiSepmmEiSuhkmkqRu\nhokkqZthIknqZphIkroZJpKkboaJJKmbYSJJ6maYSJK6GSaSpG6GiSSpm2EiSepmmEiSuu0zTJJs\nSLI7ybah2n9J8qMk9yf5epIjW315kr9LsrUtXxza59QkDySZSPK5JGn1o5JsSfJwe13a6mntJtpx\nThnqa21r/3CStfN5QiRJB25/rkyuA1ZPqW0BTq6q3wT+GrhsaNsjVbWyLR8Zql8NfAhY0ZbJPtcD\nd1TVCuCO9h7gnKG269r+JDkKuBx4O3AacPlkAEmSRmOfYVJV3wH2Tql9q6qeb2/vApbN1keSY4HX\nV9VdVVXA9cB5bfMaYGNb3zilfn0N3AUc2fo5G9hSVXur6ikGwTY17CRJC2g+7pn8a+CbQ+9PTHJf\nkr9K8s5WOw7YMdRmR6sBHFNVu9r6T4BjhvZ5fJp9Zqr/iiTrkownGd+zZ88B/liSpP3VFSZJ/hh4\nHvhKK+0CTqiqtwF/CHw1yev3t7921VI9Y5rS3zVVtaqqVo2Njc1Xt5KkKeYcJkl+H/gd4PdaCFBV\nz1XVk239XuAR4NeAnbx0KmxZqwE80aavJqfDdrf6TuD4afaZqS5JGpE5hUmS1cAfAe+tqmeH6mNJ\nlrT1NzG4ef5om8Z6Jsnp7Smui4Db2m6bgMknstZOqV/Unuo6HXi69XM7cFaSpe3G+1mtJkkakcP2\n1SDJDcAZwNFJdjB4kuoy4AhgS3vC96725Na7gCuS/Bx4EfhIVU3evP8ogyfDXs3gHsvkfZYrgZuT\nXAw8Bry/1TcD5wITwLPABwGqam+STwL3tHZXDB1DkjQC+wyTqrpwmvK1M7S9Fbh1hm3jwMnT1J8E\nzpymXsAlM/S1Adgw86glSQvJT8BLkroZJpKkboaJJKmbYSJJ6maYSJK6GSaSpG6GiSSpm2EiSepm\nmEiSuhkmkqRuhokkqZthIknqZphIkroZJpKkboaJJKmbYSJJ6maYSJK6GSaSpG6GiSSpm2EiSepm\nmEiSuu1XmCTZkGR3km1DtaOSbEnycHtd2upJ8rkkE0nuT3LK0D5rW/uHk6wdqp+a5IG2z+eSZK7H\nkCQtvP29MrkOWD2lth64o6pWAHe09wDnACvasg64GgbBAFwOvB04Dbh8Mhxamw8N7bd6LseQJI3G\nfoVJVX0H2DulvAbY2NY3AucN1a+vgbuAI5McC5wNbKmqvVX1FLAFWN22vb6q7qqqAq6f0teBHEOS\nNAI990yOqapdbf0nwDFt/Tjg8aF2O1pttvqOaepzOcZLJFmXZDzJ+J49ew7gR5MkHYh5uQHfrihq\nPvqaz2NU1TVVtaqqVo2Njb1MI5Mk9YTJE5NTS+11d6vvBI4fares1WarL5umPpdjSJJGoCdMNgGT\nT2StBW4bql/Unrg6HXi6TVXdDpyVZGm78X4WcHvb9kyS09tTXBdN6etAjiFJGoHD9qdRkhuAM4Cj\nk+xg8FTWlcDNSS4GHgPe35pvBs4FJoBngQ8CVNXeJJ8E7mntrqiqyZv6H2XwxNirgW+2hQM9hiRp\nNPYrTKrqwhk2nTlN2wIumaGfDcCGaerjwMnT1J880GNIkhaen4CXJHUzTCRJ3QwTSVI3w0SS1M0w\nkSR1M0wkSd0ME0lSN8NEktTNMJEkdTNMJEndDBNJUjfDRJLUzTCRJHUzTCRJ3QwTSVI3w0SS1M0w\nkSR1M0wkSd0ME0lSN8NEktTNMJEkdZtzmCT59SRbh5ZnknwsySeS7Byqnzu0z2VJJpI8lOTsofrq\nVptIsn6ofmKSu1v9piSHt/oR7f1E2758rj+HJKnfnMOkqh6qqpVVtRI4FXgW+HrbfNXktqraDJDk\nJOAC4K3AauALSZYkWQJ8HjgHOAm4sLUF+Ezr6y3AU8DFrX4x8FSrX9XaSZJGZL6muc4EHqmqx2Zp\nswa4saqeq6ofAxPAaW2ZqKpHq+pnwI3AmiQB3g3c0vbfCJw31NfGtn4LcGZrL0kagfkKkwuAG4be\nX5rk/iQbkixtteOAx4fa7Gi1mepvBH5aVc9Pqb+kr7b96db+JZKsSzKeZHzPnj09P58kaRbdYdLu\nY7wX+PNWuhp4M7AS2AV8tvcYc1VV11TVqqpaNTY2NqphSNIr3nxcmZwDfK+qngCoqieq6oWqehH4\nEoNpLICdwPFD+y1rtZnqTwJHJjlsSv0lfbXtb2jtJUkjMB9hciFDU1xJjh3a9j5gW1vfBFzQnsQ6\nEVgBfBe4B1jRntw6nMGU2aaqKuBO4Py2/1rgtqG+1rb184Fvt/aSpBE4bN9NZpbkNcBvAx8eKv9J\nkpVAAdsnt1XVg0luBn4APA9cUlUvtH4uBW4HlgAbqurB1tfHgRuTfAq4D7i21a8FvpxkAtjLIIAk\nSSPSFSZV9f+YcuO7qj4wS/tPA5+epr4Z2DxN/VF+OU02XP974HfnMGRJ0svAT8BLkroZJpKkboaJ\nJKlb1z0TDSxf/41p69uvfM8Cj0SSRsMrE0lSN8NEktTNMJEkdfOeyQLwnoqkVzqvTCRJ3QwTSVI3\nw0SS1M0wkSR1M0wkSd0ME0lSN8NEktTNMJEkdfNDiyM20wcawQ81Sjp4eGUiSepmmEiSuhkmkqRu\n3WGSZHuSB5JsTTLeakcl2ZLk4fa6tNWT5HNJJpLcn+SUoX7WtvYPJ1k7VD+19T/R9s1sx5AkLbz5\nujL551W1sqpWtffrgTuqagVwR3sPcA6woi3rgKthEAzA5cDbgdOAy4fC4WrgQ0P7rd7HMSRJC+zl\nmuZaA2xs6xuB84bq19fAXcCRSY4Fzga2VNXeqnoK2AKsbtteX1V3VVUB10/pa7pjSJIW2HyESQHf\nSnJvknWtdkxV7WrrPwGOaevHAY8P7buj1War75imPtsxJEkLbD4+Z/KOqtqZ5B8BW5L8aHhjVVWS\nmofjzGimY7RwWwdwwgknvJxDkKRDWveVSVXtbK+7ga8zuOfxRJuior3ubs13AscP7b6s1WarL5um\nzizHGB7bNVW1qqpWjY2N9fyYkqRZdIVJktcked3kOnAWsA3YBEw+kbUWuK2tbwIuak91nQ483aaq\nbgfOSrK03Xg/C7i9bXsmyentKa6LpvQ13TEkSQusd5rrGODr7Wndw4CvVtX/SnIPcHOSi4HHgPe3\n9puBc4EJ4FnggwBVtTfJJ4F7WrsrqmpvW/8ocB3wauCbbQG4coZjSJIWWFeYVNWjwD+Zpv4kcOY0\n9QIumaGvDcCGaerjwMn7ewxJ0sLzE/CSpG6GiSSpm2EiSepmmEiSuhkmkqRuhokkqZthIknqZphI\nkroZJpKkboaJJKmbYSJJ6jYff89EL6Pl678xbX37le9Z4JFI0sy8MpEkdTNMJEndDBNJUjfDRJLU\nzTCRJHUzTCRJ3QwTSVI3w0SS1M0wkSR1m3OYJDk+yZ1JfpDkwSR/0OqfSLIzyda2nDu0z2VJJpI8\nlOTsofrqVptIsn6ofmKSu1v9piSHt/oR7f1E2758rj+HJKlfz5XJ88C/q6qTgNOBS5Kc1LZdVVUr\n27IZoG27AHgrsBr4QpIlSZYAnwfOAU4CLhzq5zOtr7cATwEXt/rFwFOtflVrJ0kakTmHSVXtqqrv\ntfW/BX4IHDfLLmuAG6vquar6MTABnNaWiap6tKp+BtwIrEkS4N3ALW3/jcB5Q31tbOu3AGe29pKk\nEZiXeyZtmultwN2tdGmS+5NsSLK01Y4DHh/abUerzVR/I/DTqnp+Sv0lfbXtT7f2kqQR6A6TJK8F\nbgU+VlXPAFcDbwZWAruAz/Yeo2Ns65KMJxnfs2fPqIYhSa94XWGS5FUMguQrVfU1gKp6oqpeqKoX\ngS8xmMYC2AkcP7T7slabqf4kcGSSw6bUX9JX2/6G1v4lquqaqlpVVavGxsZ6flRJ0ix6nuYKcC3w\nw6r606H6sUPN3gdsa+ubgAvak1gnAiuA7wL3ACvak1uHM7hJv6mqCrgTOL/tvxa4baivtW39fODb\nrb0kaQR6/jjWbwEfAB5IsrXV/gODp7FWAgVsBz4MUFUPJrkZ+AGDJ8EuqaoXAJJcCtwOLAE2VNWD\nrb+PAzcm+RRwH4Pwor1+OckEsJdBAB2S/ONZkhaDOYdJVf0fYLonqDbPss+ngU9PU9883X5V9Si/\nnCYbrv898LsHMl5J0svHT8BLkrr5N+BfwWaaAoPBNJhTZJLmi2GiGc0WNvsKot7tkg4uTnNJkrp5\nZaJFx6sW6eDjlYkkqZtXJjroeD9GWnwMEx1SDCLp5WGYSAfAsJGmZ5hI88SrHh3KDBNpEfADpjrY\nGSbSK4Bho1Hz0WBJUjevTKRXuH1NoUnzwTCRDnE9Dw44vaZJhomkl41PuB06DBNJi5LfTH1wMUwk\nHXJezqA6VEPOMJGkRaL380ajDCofDZYkdTNMJEndDuowSbI6yUNJJpKsH/V4JOlQddCGSZIlwOeB\nc4CTgAuTnDTaUUnSoemgDRPgNGCiqh6tqp8BNwJrRjwmSTokHcxhchzw+ND7Ha0mSVpgqapRj2FO\nkpwPrK6qf9PefwB4e1VdOtRmHbCuvf114KF5OvzRwN/MU1/zyXEdmMU6Lli8Y3NcB+aVMK5/XFVj\n+2p0MH/OZCdw/ND7Za32C1V1DXDNfB84yXhVrZrvfns5rgOzWMcFi3dsjuvAHErjOpinue4BViQ5\nMcnhwAXAphGPSZIOSQftlUlVPZ/kUuB2YAmwoaoeHPGwJOmQdNCGCUBVbQY2j+DQ8z51Nk8c14FZ\nrOOCxTs2x3VgDplxHbQ34CVJi8fBfM9EkrRIGCYHYLF+fUuS7UkeSLI1yfiIx7Ihye4k24ZqRyXZ\nkuTh9rp0kYzrE0l2tvO2Ncm5IxjX8UnuTPKDJA8m+YNWH+k5m2Vci+Gc/cMk303y/Ta2/9TqJya5\nu/1+3tQezFkM47ouyY+HztnKhRzX0PiWJLkvyV+29/N7vqrKZT8WBjf5HwHeBBwOfB84adTjamPb\nDhw96nG0sbwLOAXYNlT7E2B9W18PfGaRjOsTwL8f8fk6Fjilrb8O+GsGXw800nM2y7gWwzkL8Nq2\n/irgbuB04Gbgglb/IvBvF8m4rgPOH+U5a2P6Q+CrwF+29/N6vrwy2X9+fct+qKrvAHunlNcAG9v6\nRuC8BR0UM45r5KpqV1V9r63/LfBDBt/kMNJzNsu4Rq4G/m97+6q2FPBu4JZWH8U5m2lcI5dkGfAe\n4L+392Gez5dhsv8W89e3FPCtJPe2T/0vNsdU1a62/hPgmFEOZopLk9zfpsEWfPptWJLlwNsY/I92\n0ZyzKeOCRXDO2pTNVmA3sIXBrMFPq+r51mQkv59Tx1VVk+fs0+2cXZXkiIUeF/BfgT8CXmzv38g8\nny/D5JXhHVV1CoNvUL4kybtGPaCZ1OCaelH8bw24GngzsBLYBXx2VANJ8lrgVuBjVfXM8LZRnrNp\nxrUozllVvVBVKxl888VpwG+MYhxTTR1XkpOByxiM758CRwEfX8gxJfkdYHdV3ftyHscw2X/7/PqW\nUamqne11N/B1Br9ci8kTSY4FaK+7RzweAKrqifbL/yLwJUZ03pK8isE/2F+pqq+18sjP2XTjWizn\nbFJV/RS4E/hnwJFJJj87N9Lfz6FxrW5ThlVVzwH/g4U/Z78FvDfJdgbT8+8G/hvzfL4Mk/23KL++\nJclrkrxuch04C9g2+14LbhOwtq2vBW4b4Vh+YfIf6+Z9jOC8tbnra4EfVtWfDm0a6TmbaVyL5JyN\nJTmyrb8a+G0G93TuBM5vzUZxzqYb14+G/lMQBvclFvScVdVlVbWsqpYz+Hfr21X1e8z3+Rr1EwYH\n0wKcy+CplkeAPx71eNqY3sTgybLvAw+OelzADQymP37OYB72Ygbzs3cADwP/GzhqkYzry8ADwP0M\n/vE+dgTjegeDKaz7ga1tOXfU52yWcS2Gc/abwH1tDNuA/9jqbwK+C0wAfw4csUjG9e12zrYB/5P2\nxNcoFuAMfvk017yeLz8BL0nq5jSXJKmbYSJJ6maYSJK6GSaSpG6GiSSpm2EiSepmmEiSuhkmkqRu\n/x+QWmfWiDbdUQAAAABJRU5ErkJggg==\n", 41 | "text/plain": [ 42 | "" 43 | ] 44 | }, 45 | "metadata": {}, 46 | "output_type": "display_data" 47 | } 48 | ], 49 | "source": [ 50 | "plt.bar(range(40), counts[:40])\n", 51 | "plt.show()" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 4, 57 | "metadata": { 58 | "collapsed": true 59 | }, 60 | "outputs": [], 61 | "source": [ 62 | "n = len(counts)\n", 63 | "covered = [counts[0]] * n\n", 64 | "for i in range(1, len(counts)):\n", 65 | " covered[i] = covered[i-1] + counts[i]\n", 66 | "total = covered[-1]\n", 67 | "covered = [x / total for x in covered]" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 7, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0XeV57/HvI1mzZNmaLI9IHrCxGQwIAyFhDGBIg2lC\nWpumhVsaJw00aZOSwE0XUJrbpqRJk7S+pZRQCCE4hEJwbpxFJkgYAkjGxtjGxsKyLcmDRssaLB0N\nz/3jHDsHIVkH+0hb5+j3Weuss4fX2s9m2z+23r33u83dERGR5JISdAEiIhJ/CncRkSSkcBcRSUIK\ndxGRJKRwFxFJQgp3EZEkpHAXEUlCCncRkSSkcBcRSUKTgtpwUVGRl5WVBbV5EZGEtGHDhiZ3Lx6p\nXWDhXlZWRlVVVVCbFxFJSGa2J5Z26pYREUlCCncRkSSkcBcRSUIKdxGRJKRwFxFJQgp3EZEkpHAX\nEUlCgd3nLiKSrHr7B2g70suhrl7ajoSipsPfV5xWwpmzpoxqDQp3EZHj6O7tp6UzREtniENdvbR0\nhWjtDNEa+W7p6uVQV3h9a2c4yDtD/cf9mcV5GQp3EZF46uzpo6mjh6aOUOS7h6b2EC2dPe8J6tau\nXo70Dh/U+VlpTM1OY2pOOqWTM1lUOpkp2WlMyUojPzuN/KzwZ0p2evg7K428zElMSh39HnGFu4gk\nvCOhfg4e7ubg4W4aO3poag+Hd3NnD43tvw/x5o7QsGGdlzmJwpx0puakMy0S1AU54eCemh3+FOSk\nHwvzKVlpYxLSJ0rhLiLjVm//AI3tPceC++DhnkHf4c/h7r73/NkUg4KcdIpyMyjKzaCsMJui3AwK\nczMoyk2nKC+D4twMCnPTKczJIH3S+A3qE6FwF5Ex5+60dvWyv+3IEGH9++nmzhDu7/6zk1KMkrwM\nSiZnMrc4hwvnFTJtcmbkk0FxXjjMp2ank5piwezgOKBwF5G4O9zdy/5D3exrO8L+Q93sbzvCvsj3\n/rbwd3fvwHv+XFFuOiV5mZTmZ3LmrHxK8sKhXZqfcWy6MCedlAkc2rFSuIvI+9LXP8D+tm5qW7uo\nb/19aO9r62b/oXB4d/S8u5skxaAkL5PpUzJZPH0yVywqYfqULGbkZzItP5PSyZkU5SZf10iQFO4i\n8i4DA87B9m7qWo9Q29J17Lu2NTy9v62b/oF395UU5aYzPT+L8qIcLppfxPT8zGPhPX1KFtPyMsb1\nxcdkFFO4m9ly4NtAKvCgu39t0PpTgIeAYqAF+KS718W5VhGJA3enuTPE3qjgrms9Ql1rF7UtXew7\n1E2o/91dJiV5GcwuyKbilKnMmprN7IIsZk3NZtbULErzM8mYlBrQ3shwRgx3M0sF1gBXAnVApZmt\nc/dtUc3+Bfieuz9iZpcD/wT86WgULCKxaevqpaa5k91NnexqCn/vbu6kpqmT9kF3lxTkpDN7ahZL\nZuZz9emlzJ6azeyCcHjPnJJFZprCO9HEcua+DKh2910AZrYWWAFEh/ti4AuR6eeAH8ezSBEZWmdP\nHzVN4cDe3dR5LMxrmjpp7eo91s4MZk4Jd5tcv3QmZUU5nFLw+wDPyVAPbbKJ5YjOBGqj5uuA8we1\neQP4GOGumz8E8sys0N2b41JltJ/dAQfejPuPFRnPevsHONLbH/6Efv99tPukOPK5KDWFzLRUMrNS\nyJycGp5OSyFzUiopFrnDpDXykeCUngHXfG3kdichXv+7/lvg383sZuC3QD3wnsfAzGw1sBpgzpw5\ncdq0SPLo7R+gK9RPV6jvXUHeF3UBM8WMrLRUJmelkXU0vNPCQZ5qukVQwmIJ93pgdtT8rMiyY9x9\nH+Ezd8wsF/i4ux8a/IPc/QHgAYCKigofvD4mo/x/O5Gx0N3bT3VDB9sPtLPjwGG2H2hn+4F2Gtt7\njrWZkp3G/OJcFkzLZV5xLvNLwp8Z+Vm6z1tGFEu4VwILzKyccKivBG6MbmBmRUCLuw8AdxK+c0ZE\ngJbOEFv3tbGl/jBb97Wx/UA7NU2dx24nzJiUwoJpuVxyajGLSvNYGPkU52ZgOhOXEzRiuLt7n5nd\nBjxL+FbIh9x9q5ndC1S5+zrgUuCfzMwJd8vcOoo1i4xbDe3dbK0/zJb6Nt6sb2PrvsPUHzpybP3s\ngiwWlU7mmtNLWVQ6mYWleZQVZusecIk788EDN4yRiooKr6qqCmTbIvHQ1NHDpr2H2FzfxtZImDdE\ndavMLcrh9Jn5nD5zMqfPyGfJjHzys9MCrFiSgZltcPeKkdrp/ieRGHT39rOlvo1NtYfYWHuIN2oP\nUdcaPiNPMZhfkssH5xdFwjyf06bnkZepIJfgKNxFhtBwuJvXdrfwWk0Lr+9tZfv+9mN3rMycksXS\n2VO46cIyls6ZwpIZk8lO1z8lGV/0N1ImPHenrvUIr9a08FpNM5W7W6lp6gQgOz2Vs+dM4dOXzGXp\n7KmcNTs8UqHIeKdwlwnH3alu6IiEeQuVu1vY39YNhG8/PK+sgBuXzWFZeQFLZkzWxU5JSAp3mRAO\ntHXzws5GXqxu4sWdTTR3hoDwgFjLygs4v7yAZeWFLCjJ1T3kkhQU7pKUukJ9vFrTwgtvN/FidSNv\nH+wAwkPTXnxqMRfOLeT8uQXMKcjWveSSlBTukhTcnZ0NHfzqrQZe2NlI1e5WQv0DpE9K4fzyAm44\ndxYfnB9+SEhn5jIRKNwlYfX09fPqrhZ+vb2BX7518NitiYtK87j5ojI+tKCI88oKNFytTEgKd0ko\nTR09PLe94dgZemeon8y0FD44v4jPXjqfyxeVUJqvu1lEFO4y7tUfOsL6zftZv2U/m2oP4Q6lkzNZ\ncfZMrlhUwgfmFZGVrrNzkWgKdxmX9h06wvo39/PTN/ezcW94gNHTZ07mr684lStOK2HJjMm6ECpy\nHAp3GTdaO0P8v837+PGmfWzYE36bxJIZk/nS8oV85IzpnFKYE3CFIolD4S6BCvUN8NyOBp56vY5f\nb2+gt99ZOC2P269eyLVnTKe8SIEuciIU7jLm3J1NtYd46vV6frJ5H4e6einKzeCmC8v42DmzWDxj\nctAliiQ8hbuMmdbOEE9trGfta3vZ2dBBxqQUrlpSysfOmcmH5hfpMX+ROIop3M1sOeGXX6cCD7r7\n1watnwM8AkyJtLnD3dfHuVZJQO7OqzUtrH1tL+u3HCDUN8DS2VP42sfO4NozpzNZw+KKjIoRw93M\nUoE1wJVAHVBpZuvcfVtUs78DnnD3/zCzxcB6oGwU6pUE0doZ4skNdTxeuZddjZ3kZU5i5XmzWXne\nHHW7iIyBWM7clwHV7r4LwMzWAiuA6HB34Oi/2HxgXzyLlMRR3dDBQy/V8NTrdXT3DnDuKVP5l0/M\n5yNnTNe96CJjKJZwnwnURs3XAecPanMP8HMz+ysgB/hwXKqThODuvFTdzHdf3MVzOxpJn5TCx86e\nyc0XlbGoVGfpIkGI1wXVVcDD7v4NM7sQeNTMTnf3gehGZrYaWA0wZ86cOG1agtLT188zm/bx0Is1\nbD/QTlFuOn/z4VP5kwvmUJSbEXR5IhNaLOFeD8yOmp8VWRbtFmA5gLv/zswygSKgIbqRuz8APADh\nF2SfYM0SsMPdvXzv5d08/PIemjp6WDgtj/tuOJPrzpqhQbpExolYwr0SWGBm5YRDfSVw46A2e4Er\ngIfN7DQgE2iMZ6ESvLYjvTz80m6+++IuDnf3ccmpxXzqQ3O5aH6hhgIQGWdGDHd37zOz24BnCd/m\n+JC7bzWze4Eqd18HfBH4LzP7G8IXV292d52ZJ4m2rl6++1IN//1SDe3dfVy5eBqfu3wBZ8zKD7o0\nERlGTH3ukXvW1w9adlfU9DbgoviWJkFr7Qzx3RdrePjl3XT09LF8SSl/dcV8lsxQqIuMd3pCVd6j\nK9THQy/W8J+/2UVHqI9rT5/ObZfP57TpuvNFJFEo3OWY3v4BflhZy7d/tZPG9h6uXDyNv71qIQtL\n84IuTUTeJ4W74O78bMsBvv7sDmqaOllWVsD9nzyHc08pCLo0ETlBCvcJ7u2D7dz9zFZ+t6uZhdPy\neOjmCi5bWKK7X0QSnMJ9gmrv7uVbv9zJwy/vJjdjEl+9/nRWLZtDaopCXSQZKNwnGHfn6Y31/OP6\n7TR39rDyvDncfvVCCnLSgy5NROJI4T6B7DzYzv9++k0qd7eydPYUHrq5gjNnTQm6LBEZBQr3CSDU\nN8D9v3mHf/91NdkZqdz38TO54dxZpKgLRiRpKdyT3KbaQ3z5yc3sONjOR8+awd0fXaxBvUQmAIV7\nkuoK9fGNn7/Nf79UQ0leJg/+WQUfXjwt6LJEZIwo3JPQizubuPPpzdS2HOFPzp/Dl69ZpNfZiUww\nCvck0hXq4x/Xv8X3X9lLeVEOa1dfwAVzC4MuS0QCoHBPEq/vbeWLT7zB7uZObvlgObdfvVBjq4tM\nYAr3BNfbP8B3frWTNc9VMz0/ix/8xQVcOE9n6yITncI9ge1q7ODzazfxZn0bHz9nFndft1h96yIC\nKNwT1tMb6/jK01vImJTC/Z88l+WnlwZdkoiMIzGFu5ktB75N+E1MD7r71wat/1fgsshsNlDi7nr0\ncRR0hfq4+5mt/GhDHcvKCvj2qqVMz88KuiwRGWdGDHczSwXWAFcCdUClma2LvH0JAHf/m6j2fwWc\nPQq1TnjbDxzmth9s5J3GDj53+Xw+d8UCJqWmBF2WiIxDsZy5LwOq3X0XgJmtBVYA24Zpvwq4Oz7l\nyVFPbqjjK0+/yeSsNL5/y/lcNL8o6JJEZByLJdxnArVR83XA+UM1NLNTgHLg18OsXw2sBpgzZ877\nKnSiCvUN8NWfbuN7v9vDB+YV8u2VZ1Ocp+EDROT44n1BdSXwpLv3D7XS3R8AHgCoqKjwOG876TS0\nd3PrY69TubuV1RfP5UtXL1Q3jIjEJJZwrwdmR83Piiwbykrg1pMtSsIPJf3l9zfQdqSX76w6m+vO\nmhF0SSKSQGIJ90pggZmVEw71lcCNgxuZ2SJgKvC7uFY4AT3+2l7uemYL0/OzePqzyzht+uSgSxKR\nBDNiuLt7n5ndBjxL+FbIh9x9q5ndC1S5+7pI05XAWndXd8sJ6unr555123j8tb1cfGox31m5lCnZ\nekOSiLx/MfW5u/t6YP2gZXcNmr8nfmVNPM0dPXz60Q1U7Wnls5fO44tXLdT7TEXkhOkJ1XFg58F2\n/vyRShoO9/Bvq87mo+pfF5GTpHAP2As7G/nsY6+TMSmVtasv4Ow5U4MuSUSSgMI9QGtf28tXfryF\nBSW5PHhTBbOmZgddkogkCYV7QO7/zTt87WfbueTUYtb8yTnkZuhQiEj8KFHGmLtz37M7+I/n3+EP\nzpzON/9oKemT9GCSiMSXwn0M9Q84dz2zhcde3cuqZXP46vWn644YERkVCvcx0tc/wBd/9AbPbNrH\nZy6Zx5eXL8RMwS4io0PhPgZCfQN8fu1GfrblALdfvZBbL5sfdEkikuQU7qOsp6+fWx97nV++1cDf\nfeQ0/uJDc4MuSUQmAIX7KOru7Wf1oxv47duN/MOKJfzphWVBlyQiE4TCfZR0hfq45eEqXqlp5p8/\nfgZ/fJ7GrxeRsaNwHwXt3b38+cOVbNjTyjf/6Cz+8OxZQZckIhOMwj3OOnv6uPm/K3mj9hD/tuoc\nPnLm9KBLEpEJSOEeR929/Xz60Q1s3NvKmhvP4ZozFOwiEgyFe5z09Q/wucc38mJ1E//yibMU7CIS\nqJieezez5Wa2w8yqzeyOYdr8kZltM7OtZvaD+JY5vg0MOF96cjM/33aQv79uCTecqz52EQnWiGfu\nZpYKrAGuBOqASjNb5+7botosAO4ELnL3VjMrGa2Cxxt3556fbOWpjfXcfvVCbvpAWdAliYjEdOa+\nDKh2913uHgLWAisGtfkUsMbdWwHcvSG+ZY5f//f5d/je7/aw+uK5fPbSeUGXIyICxBbuM4HaqPm6\nyLJopwKnmtlLZvaKmS0f6geZ2WozqzKzqsbGxhOreBx56vU6vv7sDq5fOoM7li/SWDEiMm7Ea6zZ\nScAC4FJgFfBfZjZlcCN3f8DdK9y9ori4OE6bDsYLOxv50pOb+cC8Qu674SxSNLqjiIwjsYR7PTA7\nan5WZFm0OmCdu/e6ew3wNuGwT0pb6tv4zKMbmF+Sy/1/eq7GYxeRcSeWVKoEFphZuZmlAyuBdYPa\n/JjwWTtmVkS4m2ZXHOscNxoOd3PLI5XkZ6Xx8P9axuTMtKBLEhF5jxHD3d37gNuAZ4G3gCfcfauZ\n3Wtm10WaPQs0m9k24DngdndvHq2igxLqG+AvH3udw0f6+O7N51Ganxl0SSIiQ4rpISZ3Xw+sH7Ts\nrqhpB74Q+SSte36ylQ17Wvn3G8/mtOmTgy5HRGRY6iyO0eOv7eUHr+7lM5fM4w/OnBF0OSIix6Vw\nj8G2fYe5+5mtXHxqMbdfvTDockRERqRwH8GRUD+fW7uRKdlpfOuPl+qF1iKSEDRw2Ai++tNtVDd0\n8P1bzqcgJz3ockREYqIz9+N4qbqJx17dy6c+VM4HFxQFXY6ISMwU7sM4Eurnfz/9JuVFOXzxKvWz\ni0hiUbfMML71q7fZ09zF45+6gMy01KDLERF5X3TmPoQt9W08+EINK8+bzYXzCoMuR0TkfVO4D9LX\nP8CX/2czBTnp3HnNaUGXIyJyQhTug6ytrGXrvsPc/dHF5Gdr3BgRSUwK9yhtXb184+c7OL+8gI/o\nHagiksAU7lH+9Zdv03akl7s/ukQv3hCRhKZwj9jb3MWjr+xh5bI5LJ6hQcFEJLEp3CP+87fvkGrG\n569I2neMiMgEonAn/AKOH1XVcUPFLKZN1hjtIpL4Ygp3M1tuZjvMrNrM7hhi/c1m1mhmmyKfv4h/\nqaPnwRdr6BsY4DMXzwu6FBGRuBjxCVUzSwXWAFcSfldqpZmtc/dtg5r+0N1vG4UaR1XbkV4ee2UP\n1501gzmF2UGXIyISF7GcuS8Dqt19l7uHgLXAitEta+z8qKqWzlA/n7p4btCliIjETSzhPhOojZqv\niywb7ONmttnMnjSz2XGpbpQNDDiPvrKH88qmsmRGftDliIjETbwuqP4EKHP3M4FfAI8M1cjMVptZ\nlZlVNTY2xmnTJ+43OxvZ09zFn11YFnQpIiJxFUu41wPRZ+KzIsuOcfdmd++JzD4InDvUD3L3B9y9\nwt0riouLT6TeuPrey7spycvg6iWlQZciIhJXsYR7JbDAzMrNLB1YCayLbmBm0c/qXwe8Fb8SR8fu\npk6ef7uRVcvmkD5Jd4SKSHIZ8W4Zd+8zs9uAZ4FU4CF332pm9wJV7r4O+JyZXQf0AS3AzaNYc1x8\n/5U9pJpx4/lzgi5FRCTuYnpZh7uvB9YPWnZX1PSdwJ3xLW30HAn180RVLctPL9VDSyKSlCZkf8RP\n39zP4e4+PnnBKUGXIiIyKiZkuD9RVUt5UQ7nlxcEXYqIyKiYcOFe09TJazUtfKJilob1FZGkNeHC\n/YmqWlJTjBvOmRV0KSIio2ZChXtf/wD/s6GOyxYWU6ILqSKSxCZUuL9Q3URDew+fqEiI0RFERE7Y\nhAr3ZzbWk5+VxmULS4IuRURkVE2YcO8K9fHzbQe59ozpeiJVRJLehEm5X2w7SFeon+uXzgi6FBGR\nUTdhwv2ZTfuYkZ/JeWW6t11Ekt+ECPeWzhC/fbuRjy6dQUqK7m0XkeQ3IcL9p5v30TfgXL90qHeM\niIgknwkR7j/etI+F0/I4bfrkoEsRERkTSR/utS1dbNjTyoqzdSFVRCaOpA/3Z7ceAOAPzlC4i8jE\nEVO4m9lyM9thZtVmdsdx2n3czNzMKuJX4sn5xbaDLCrNY05hdtCliIiMmRHD3cxSgTXANcBiYJWZ\nLR6iXR7weeDVeBd5olo6Q1TubuHKxdOCLkVEZEzFcua+DKh2913uHgLWAiuGaPcPwD8D3XGs76T8\nensDA47CXUQmnFjCfSZQGzVfF1l2jJmdA8x295/GsbaT9ottByidnMkZM/ODLkVEZEyd9AVVM0sB\nvgl8MYa2q82sysyqGhsbT3bTx9Xd289v327iw4tL9FIOEZlwYgn3eiB6jNxZkWVH5QGnA8+b2W7g\nAmDdUBdV3f0Bd69w94ri4uITrzoGL7/TxJHefq5cXDqq2xERGY9iCfdKYIGZlZtZOrASWHd0pbu3\nuXuRu5e5exnwCnCdu1eNSsUxem57I1lpqVwwV2PJiMjEM2K4u3sfcBvwLPAW8IS7bzWze83sutEu\n8ES4O8/taOCi+YVkTEoNuhwRkTE3KZZG7r4eWD9o2V3DtL305Ms6Oe80dlLXeoRPXzIv6FJERAKR\nlE+oPr+jAYBLTx3dfn0RkfEqScO9kfklucwu0FOpIjIxJV24d/b08VpNC5ct1Fm7iExcSRfuL7/T\nTKh/gEv1EmwRmcCSLtyf39FATnoqFWVTgy5FRCQwSRXu7s7zOxr5wPwi3QIpIhNaUoV7dUMH9YeO\ncJm6ZERkgkuqcH/u6C2QupgqIhNcUoX78zsaOXVaLjOmZAVdiohIoJIm3Dt6+qjc3aIuGRERkijc\nX6puorffuURdMiIiyRPuz+9oIDdjEhWnaBRIEZGkCPejt0BeNL+Q9ElJsUsiIiclKZJwx8F29rd1\nq79dRCQiKcL9xZ1NAFysUSBFRIAYw93MlpvZDjOrNrM7hlj/GTN708w2mdmLZrY4/qUO79WaFk4p\nzNYtkCIiESOGu5mlAmuAa4DFwKohwvsH7n6Guy8F7iP8wuwxMTDgVO5u4fxyXUgVETkqljP3ZUC1\nu+9y9xCwFlgR3cDdD0fN5gAevxKPb8fBdg519bKsvHCsNikiMu7F8pq9mUBt1HwdcP7gRmZ2K/AF\nIB24PC7VxWDj3kMALCvTmbuIyFFxu6Dq7mvcfR7wZeDvhmpjZqvNrMrMqhobG+Oy3eqGDrLSUpk1\nVf3tIiJHxRLu9cDsqPlZkWXDWQtcP9QKd3/A3SvcvaK4OD53tuxq6qC8KIeUFIvLzxMRSQaxhHsl\nsMDMys0sHVgJrItuYGYLomY/AuyMX4nHt6uxk7nFOWO1ORGRhDBiuLt7H3Ab8CzwFvCEu281s3vN\n7LpIs9vMbKuZbSLc737TqFUcpaevn7rWLuYW547F5kREEkYsF1Rx9/XA+kHL7oqa/nyc64rJnuYu\nBhzm6cxdRORdEvoJ1V2NHQDMLdKZu4hItIQO97rWIwDMKcgOuBIRkfElocO9ob2HjEkpTM6KqXdJ\nRGTCSOxwP9xNyeQMzHQbpIhItMQO9/YeSvIygy5DRGTcSfhwL87NCLoMEZFxJ7HDPdItIyIi75aw\n4d7d28/h7j5K8hTuIiKDJWy4N7b3AKjPXURkCAkb7g2RcC9Wt4yIyHskbLg3tncDqFtGRGQICRvu\nx87cFe4iIu+RsOHe1BHCDApzFO4iIoMlbLg3d/RQkJ1Oql7SISLyHgkc7iEKc9ODLkNEZFxK3HDv\n7FGXjIjIMGIKdzNbbmY7zKzazO4YYv0XzGybmW02s1+Z2SnxL/XddOYuIjK8EcPdzFKBNcA1wGJg\nlZktHtRsI1Dh7mcCTwL3xbvQwRo7eijSuDIiIkOK5cx9GVDt7rvcPQSsBVZEN3D359y9KzL7CjAr\nvmW+W09fP+3dfRTm6MxdRGQosYT7TKA2ar4usmw4twA/G2qFma02syozq2psbIy9ykFaOkMAFOrM\nXURkSHG9oGpmnwQqgK8Ptd7dH3D3CnevKC4uPuHtNHccDXeduYuIDCWW99PVA7Oj5mdFlr2LmX0Y\n+Apwibv3xKe8oTV1hH98kcJdRGRIsZy5VwILzKzczNKBlcC66AZmdjbwn8B17t4Q/zLf7diZu26F\nFBEZ0ojh7u59wG3As8BbwBPuvtXM7jWz6yLNvg7kAj8ys01mtm6YHxcXzZ3hM/cCnbmLiAwplm4Z\n3H09sH7Qsruipj8c57qOq7kzRHpqCnkZMZUvIjLhJOQTqi0dIQpy0jHTuDIiIkNJzHDvDDFV97iL\niAwrIcO9uTOkB5hERI4jIcO9tSvcLSMiIkNLyHA/2ucuIiJDS7hw7+nrp71H48qIiBxPwoV7a2cv\ngC6oiogcR8KF+9EHmHTmLiIyvIQL96Nn7upzFxEZXsKF+7Ezdw09ICIyrIQL96NjuRdo0DARkWEl\nXLjPnJLFlYunkZ+VFnQpIiLjVsKNvHXVklKuWlIadBkiIuNawp25i4jIyBTuIiJJKKZwN7PlZrbD\nzKrN7I4h1l9sZq+bWZ+Z3RD/MkVE5P0YMdzNLBVYA1wDLAZWmdniQc32AjcDP4h3gSIi8v7FckF1\nGVDt7rsAzGwtsALYdrSBu++OrBsYhRpFROR9iqVbZiZQGzVfF1kmIiLj1JheUDWz1WZWZWZVjY2N\nY7lpEZEJJZZwrwdmR83Piix739z9AXevcPeK4uLiE/kRIiISg1j63CuBBWZWTjjUVwI3nuyGN2zY\n0GRme07wjxcBTSdbwzihfRmftC/jk/YFTomlkbn7yI3MrgW+BaQCD7n7/zGze4Eqd19nZucBTwNT\ngW7ggLsvOYGiY2JmVe5eMVo/fyxpX8Yn7cv4pH2JXUzDD7j7emD9oGV3RU1XEu6uERGRcUBPqIqI\nJKFEDfcHgi4gjrQv45P2ZXzSvsQopj53ERFJLIl65i4iIseRcOE+0iBm452Z7TazN81sk5lVRZYV\nmNkvzGxn5Htq0HUOxcweMrMGM9sStWzI2i3sO5HjtNnMzgmu8vcaZl/uMbP6yLHZFLlL7Oi6OyP7\nssPMrg6m6vcys9lm9pyZbTOzrWb2+cjyhDsux9mXRDwumWb2mpm9EdmXv48sLzezVyM1/9DM0iPL\nMyLz1ZH1ZSddhLsnzIfwrZjvAHOBdOANYHHQdb3PfdgNFA1adh9wR2T6DuCfg65zmNovBs4BtoxU\nO3At8DPAgAuAV4OuP4Z9uQf42yHaLo78XcsAyiN/B1OD3odIbdOBcyLTecDbkXoT7rgcZ18S8bgY\nkBuZTgNejfz3fgJYGVl+P/CXkenPAvdHplcCPzzZGhLtzP3YIGbuHgKODmKW6FYAj0SmHwGuD7CW\nYbn7b4FvNw+wAAACuklEQVSWQYuHq30F8D0PewWYYmbTx6bSkQ2zL8NZAax19x53rwGqCf9dDJy7\n73f31yPT7cBbhMd+Srjjcpx9Gc54Pi7u7h2R2bTIx4HLgScjywcfl6PH60ngCjOzk6kh0cI9GQYx\nc+DnZrbBzFZHlk1z9/2R6QPAtGBKOyHD1Z6ox+q2SHfFQ1HdYwmxL5Ff5c8mfJaY0Mdl0L5AAh4X\nM0s1s01AA/ALwr9ZHHL3vkiT6HqP7UtkfRtQeDLbT7RwTwYfdPdzCI+Pf6uZXRy90sO/lyXkLUyJ\nXHvEfwDzgKXAfuAbwZYTOzPLBf4H+Gt3Pxy9LtGOyxD7kpDHxd373X0p4Qc8lwGLxnL7iRbucRvE\nLCjuXh/5biA8ZMMy4ODRX40j3w3BVfi+DVd7wh0rdz8Y+Qc5APwXv/8Vf1zvi5mlEQ7Dx9z9qcji\nhDwuQ+1Loh6Xo9z9EPAccCHhbrCjIwNE13tsXyLr84Hmk9luooX7sUHMIleZVwLrAq4pZmaWY2Z5\nR6eBq4AthPfhpkizm4BngqnwhAxX+zrgzyJ3Z1wAtEV1E4xLg/qe/5DwsYHwvqyM3NFQDiwAXhvr\n+oYS6Zf9LvCWu38zalXCHZfh9iVBj0uxmU2JTGcBVxK+hvAccPRVpIOPy9HjdQPw68hvXCcu6KvK\nJ3AV+lrCV9HfAb4SdD3vs/a5hK/uvwFsPVo/4b61XwE7gV8CBUHXOkz9jxP+tbiXcH/hLcPVTvhu\ngTWR4/QmUBF0/THsy6ORWjdH/rFNj2r/lci+7ACuCbr+qLo+SLjLZTOwKfK5NhGPy3H2JRGPy5nA\nxkjNW4C7IsvnEv4fUDXwIyAjsjwzMl8dWT/3ZGvQE6oiIkko0bplREQkBgp3EZEkpHAXEUlCCncR\nkSSkcBcRSUIKdxGRJKRwFxFJQgp3EZEk9P8BE7XX2qW1z8MAAAAASUVORK5CYII=\n", 78 | "text/plain": [ 79 | "" 80 | ] 81 | }, 82 | "metadata": {}, 83 | "output_type": "display_data" 84 | } 85 | ], 86 | "source": [ 87 | "plt.plot(range(300), covered[:300])\n", 88 | "plt.plot(range(300), [0.9]*300)\n", 89 | "plt.show()" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 8, 95 | "metadata": { 96 | "collapsed": true 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "with open('data/selected.txt', 'wb') as fout:\n", 101 | " fout.write(''.join(emojis[:100]).encode('utf8'))\n", 102 | " fout.write(b'\\n')" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 9, 108 | "metadata": { 109 | "scrolled": true 110 | }, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | "😂\n", 117 | "😍\n", 118 | "😭\n", 119 | "❤\n", 120 | "🔥\n", 121 | "😊\n", 122 | "👏\n", 123 | "🏽\n", 124 | "😩\n", 125 | "🏼\n", 126 | "💕\n", 127 | "🏻\n", 128 | "🏾\n", 129 | "😘\n", 130 | "✨\n", 131 | "💯\n", 132 | "🙏\n", 133 | "🌹\n", 134 | "💀\n", 135 | "🙄\n", 136 | "🙌\n", 137 | "🎉\n", 138 | "👌\n", 139 | "♥\n", 140 | "💙\n", 141 | "👍\n", 142 | "🙃\n", 143 | "👀\n", 144 | "💖\n", 145 | "💪\n", 146 | "☺\n", 147 | "💋\n", 148 | "😎\n", 149 | "💜\n", 150 | "😉\n", 151 | "💥\n", 152 | "😳\n", 153 | "💗\n", 154 | "👉\n", 155 | "♀\n", 156 | "💞\n", 157 | "😏\n", 158 | "😁\n", 159 | "😅\n", 160 | "😈\n", 161 | "⚡\n", 162 | "💓\n", 163 | "😢\n", 164 | "😜\n", 165 | "😱\n", 166 | "😒\n", 167 | "😌\n", 168 | "🎶\n", 169 | "♡\n", 170 | "★\n", 171 | "🌟\n", 172 | "🏆\n", 173 | "💛\n", 174 | "🙂\n", 175 | "💦\n", 176 | "😋\n", 177 | "🙈\n", 178 | "✌\n", 179 | "👇\n", 180 | "💘\n", 181 | "💔\n", 182 | "💚\n", 183 | "👑\n", 184 | "😆\n", 185 | "😴\n", 186 | "➡\n", 187 | "👊\n", 188 | "😻\n", 189 | "😀\n", 190 | "✊\n", 191 | "🌸\n", 192 | "💃\n", 193 | "❄\n", 194 | "😫\n", 195 | "😇\n", 196 | "😔\n", 197 | "😛\n", 198 | "😄\n", 199 | "🐐\n", 200 | "♂\n", 201 | "🚨\n", 202 | "😤\n", 203 | "🗣\n", 204 | "👅\n", 205 | "😬\n", 206 | "✔\n", 207 | "😡\n", 208 | "💰\n", 209 | "❗\n", 210 | "🎈\n", 211 | "⚽\n", 212 | "😕\n", 213 | "📷\n", 214 | "💫\n", 215 | "✅\n", 216 | "\n", 217 | "\n" 218 | ] 219 | } 220 | ], 221 | "source": [ 222 | "with open('data/selected.txt', 'rb') as fin:\n", 223 | " for ch in fin.readline().decode('utf-8'):\n", 224 | " print(ch)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 10, 230 | "metadata": { 231 | "collapsed": true 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "with open('data/plain_dataset.pickle', 'rb') as fin:\n", 236 | " X, Y = pickle.load(fin)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 11, 242 | "metadata": { 243 | "collapsed": true 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "with open('data/dataset.pickle', 'rb') as fin:\n", 248 | " dataset = pickle.load(fin)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 2, 254 | "metadata": { 255 | "collapsed": true 256 | }, 257 | "outputs": [], 258 | "source": [ 259 | "with open('data/plain_dataset_meta.pickle', 'rb') as fin:\n", 260 | " dataset_meta = pickle.load(fin)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 15, 266 | "metadata": { 267 | "collapsed": true 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "def recover_sent(tokens, predict):\n", 272 | " alphabet = dataset_meta['alphabet']\n", 273 | " emojis = dataset_meta['emoji']\n", 274 | " return ''.join(alphabet[t] for t in tokens), emojis[predict]\n", 275 | "\n", 276 | "def recover_n(n):\n", 277 | " return recover_sent(X[n], Y[n])" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 16, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "('[START]bed, stay in bed... the feeling of your skin locked in my head. http t.co', '😍')\n", 290 | "(\"[START]steph curry is steph curry though it's was only that one game though lmao\", '😂')\n", 291 | "('[START]you know it was man', '😂')\n", 292 | "(\"[START]welcome n we've been looking forward to meeting you!it's gunna be a good un\", '👌')\n", 293 | "('[START]dont forget about the mass streaming tonight!! starting at npm kst!! url', '😊')\n", 294 | "('[START]yep!! they are shooting together!! happy na ang neyshen!! url', '😍')\n", 295 | "(\"[START]i'm not sure why we're having a bikini party in january but i'm not complaining url\", '😏')\n", 296 | "(\"[START]since its a new year.....my new year's resolution is to get to married\", '💕')\n", 297 | "('[START]runtown mad over you', '❤')\n", 298 | "('[START]i fucked with somebody nigga before and ima expose ... tf this gotta do with me ? url', '💀')\n", 299 | "(\"[START]do you think she's cute? yes or nah? yeeeeeeesss url\", '💖')\n", 300 | "(\"[START]does anybody else's cat sleep with their tongue out like this?\", '😂')\n", 301 | "(\"[START]can y'all niggas stop hating on obama like hop off trump is stupid af for ever and always!!!\", '😴')\n", 302 | "(\"[START]i'll forever love url\", '💀')\n", 303 | "('[START]never been that type to force it', '😩')\n", 304 | "('[START]healthy new years resolutions url', '🎉')\n", 305 | "('[START]envyavenue that arch by said energizer. url', '📷')\n", 306 | "('[START]da mf gasway bitch url', '🏽')\n", 307 | "('[START]really papa g am totally speachless', '😍')\n", 308 | "('[START]hate how ur bed seems nx cosier in the mornings', '😭')\n" 309 | ] 310 | } 311 | ], 312 | "source": [ 313 | "for i in range(80, 100):\n", 314 | " print(recover_n(i))" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": 11, 320 | "metadata": { 321 | "scrolled": false 322 | }, 323 | "outputs": [ 324 | { 325 | "data": { 326 | "text/plain": [ 327 | "'🎉\", \"🎈\", \"🏽\", \"🙄\", \"👑\", \"✨\", \"💞\", \"💕\", \"❤\", \"😏\", \"🔥\", \"😎\", \"💀\", \"😂\", \"😍\", \"😊\", \"😈\", \"♥\", \"💔\", \"😅\", \"🌟\", \"😜\", \"😭\", \"💗\", \"😋\", \"🌹\", \"😩\", \"💦\", \"♂\", \"🙏\", \"☺\", \"💯\", \"😆\", \"➡\", \"🙌\", \"💜\", \"✔\", \"💓\", \"💙\", \"😀\", \"👉\", \"😬\", \"👌\", \"😘\", \"♡\", \"🙃\", \"😁\", \"🙂\", \"👀\", \"💃\", \"💛\", \"👏\", \"👍\", \"😛\", \"💪\", \"💋\", \"😻\", \"😉\", \"😄\", \"😴\", \"💥\", \"💖\", \"😤\", \"🚨\", \"⚡\", \"😳\", \"🎶\", \"🗣\", \"👅\", \"😫\", \"✌\", \"💚\", \"🙈\", \"😇\", \"😒\", \"😌\", \"❗\", \"😢\", \"😕\", \"👊\", \"🌙\", \"👇\", \"😔\", \"❄\", \"💘\", \"✊\", \"💫\", \"😡\", \"♀\", \"🏆\", \"🌸\", \"★\", \"😱\", \"📷\", \"💰\", \"⚽\", \"🐐\", \"✅'" 328 | ] 329 | }, 330 | "execution_count": 11, 331 | "metadata": {}, 332 | "output_type": "execute_result" 333 | } 334 | ], 335 | "source": [ 336 | "'\", \"'.join(dataset_meta['emoji'])" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 12, 342 | "metadata": { 343 | "collapsed": true 344 | }, 345 | "outputs": [], 346 | "source": [ 347 | "from tensorflow import GraphDef" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": 13, 353 | "metadata": { 354 | "collapsed": true 355 | }, 356 | "outputs": [], 357 | "source": [ 358 | "g = GraphDef()" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 14, 364 | "metadata": { 365 | "collapsed": true 366 | }, 367 | "outputs": [], 368 | "source": [ 369 | "with open('export/inference.pb', 'rb') as fin:\n", 370 | " sg = fin.read()\n", 371 | "g.ParseFromString(sg)" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 19, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "with open('/tmp/graph.txt', 'w') as fout:\n", 381 | " fout.write(str(g))" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 20, 387 | "metadata": { 388 | "collapsed": true 389 | }, 390 | "outputs": [], 391 | "source": [ 392 | "with open('export/frozen.pb', 'rb') as fin:\n", 393 | " sg = fin.read()\n", 394 | "g.ParseFromString(sg)\n", 395 | "with open('/tmp/graph2.txt', 'w') as fout:\n", 396 | " fout.write(str(g))" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": true 404 | }, 405 | "outputs": [], 406 | "source": [] 407 | } 408 | ], 409 | "metadata": { 410 | "kernelspec": { 411 | "display_name": "Python 3", 412 | "language": "python", 413 | "name": "python3" 414 | }, 415 | "language_info": { 416 | "codemirror_mode": { 417 | "name": "ipython", 418 | "version": 3 419 | }, 420 | "file_extension": ".py", 421 | "mimetype": "text/x-python", 422 | "name": "python", 423 | "nbconvert_exporter": "python", 424 | "pygments_lexer": "ipython3", 425 | "version": "3.6.1" 426 | } 427 | }, 428 | "nbformat": 4, 429 | "nbformat_minor": 2 430 | } 431 | -------------------------------------------------------------------------------- /build_dataset.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 25, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import json\n", 12 | "import re\n", 13 | "import pickle\n", 14 | "from collections import defaultdict" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 26, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "with open('data/stat.txt') as fin:\n", 24 | " emojis = set(fin.read().strip())\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 27, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "text/plain": [ 35 | "\"aaa i'm ddd n url gg !\"" 36 | ] 37 | }, 38 | "execution_count": 27, 39 | "metadata": {}, 40 | "output_type": "execute_result" 41 | } 42 | ], 43 | "source": [ 44 | "re_num = re.compile('-?[0-9]+(\\.)?[0-9]*')\n", 45 | "re_tag_user = re.compile('[#@][^\\s!?.]+')\n", 46 | "re_url = re.compile(r'(?:https?:\\/\\/)?(?:[\\w\\.]+\\/+)+(?:\\.?[\\w]{2,})+')\n", 47 | "re_invalid = re.compile('[^\\w.!?,\\']|RT[: ]')\n", 48 | "re_spaces = re.compile('\\s+')\n", 49 | "\n", 50 | "def preprocess(text):\n", 51 | " text = re_tag_user.sub('', text)\n", 52 | " text = re_num.sub('N', text)\n", 53 | " text = re_url.sub('URL', text)\n", 54 | " text = re_invalid.sub(' ', text)\n", 55 | " text = re_spaces.sub(' ', text)\n", 56 | " return text.strip().lower()\n", 57 | "\n", 58 | "preprocess('RT: aaa #bbb_ccc i\\'m ddd #e 1234.58 # https://t.co/orz/fjiewa gg @user!')" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 28, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "MAX_SAMPLES = 500000\n", 68 | "samples = defaultdict(set)\n", 69 | "with open('data/extracted.list') as fin:\n", 70 | " run = True\n", 71 | " for line in fin:\n", 72 | " if not line.strip():\n", 73 | " continue\n", 74 | " t = json.loads(line)\n", 75 | " text = t['text']\n", 76 | " cat = set()\n", 77 | " for ch in text:\n", 78 | " if ch in emojis:\n", 79 | " cat.add(ch)\n", 80 | " if not cat:\n", 81 | " continue\n", 82 | " normalized = preprocess(text)\n", 83 | " if not normalized:\n", 84 | " continue\n", 85 | " for ch in cat:\n", 86 | " if len(samples[ch]) < MAX_SAMPLES:\n", 87 | " samples[ch].add(normalized)\n", 88 | " if all(len(v) == MAX_SAMPLES for v in samples.values()):\n", 89 | " run = False\n", 90 | " if not run:\n", 91 | " break\n", 92 | "\n", 93 | "# picked data: map>\n", 94 | "# where len(data[*]) == MAX_SAMPLES\n", 95 | "with open('data/dataset.pickle', 'wb') as fout:\n", 96 | " pickle.dump(samples, fout)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": { 103 | "collapsed": true 104 | }, 105 | "outputs": [], 106 | "source": [] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": { 112 | "collapsed": true 113 | }, 114 | "outputs": [], 115 | "source": [] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 22, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "name": "stdout", 124 | "output_type": "stream", 125 | "text": [ 126 | "happy new year!! url\n", 127 | "like.... url\n", 128 | "finally n !!!!! happy new year from mountain standard time !!!!!!!!!!\n", 129 | "when you say you're just gonna take a sip of wine and suddenly you're on your nth glass\n", 130 | "king of popsplit url url\n", 131 | "happy new years!!!\n", 132 | "happy new years url\n", 133 | "happy new year\n", 134 | "friends.... on a new day of n i wnts to say, open your blind eyes sickular hindu really they r in trouble https\n", 135 | "lionel richie all night long url\n", 136 | "orbitcounter\n", 137 | "i want your attention, but i won't beg for it\n", 138 | "henlo n\n", 139 | "amp i are so mean but we don't give a fuck\n", 140 | "she hid under the bed to test her relationship. what her boyfriend did had her floored url\n", 141 | "can't figure out if it's the end or beginning\n", 142 | "happy new years from the savage url url\n", 143 | "instagram cookiemonstern with url url\n", 144 | "happy new years. be safe amp stay blessed\n", 145 | "highly blessed to see another year\n", 146 | "if i said i was at church, don't ask me what my religion is!\n", 147 | "wishing our beautiful little wombat, wattle, a very happy nst birthday url\n", 148 | "oiya, happy new year n pic url\n", 149 | "hope n gives me someone loyal\n", 150 | "magic comes from what is inside you. jim butcher\n", 151 | "fuck it up then babe url\n", 152 | "new year, new socks all styles available at url join the sock game url\n", 153 | "best thing about n url\n", 154 | "when you and babe only bout each other url\n", 155 | "ready . url\n", 156 | "awe i love you too url\n", 157 | "all i wanted was a new years kiss\n", 158 | "so who gon wife me? url\n", 159 | "my lil brother arrived at el a few hours ago and he wants to go back home already. udikiwe yimonti url\n", 160 | "u clever son\n", 161 | "are you a tower, because eiffel for you.\n", 162 | "i really wanna repo this acc so i can have my username url\n", 163 | "thanks lil bro\n", 164 | "i don't need no friends solo mis n uvas\n", 165 | "qqvote ariana grande lady gaga sia\n", 166 | "rt if you at home tonight\n", 167 | "the art of knowing is knowing to ignore . good morning\n", 168 | "danny says he spent over\n", 169 | "happy new year! n is about to be shook! url\n", 170 | "customize your own love locket with me! any style get n off to customize a love locket url\n", 171 | "de cabeza happy new year n holiday inn matamoros url\n", 172 | "happy new years\n", 173 | "and i am happy and thankful that we have those long talks. i'm also humbled by your kind words.\n", 174 | "if you want followers\n", 175 | "day n of n thank you god for allowing me to see this day\n", 176 | "tim is sleep amp i wanted to fall asleep on the phone\n", 177 | "happy new year here's to many more amazing memories\n", 178 | "welcome n! and happy url\n", 179 | "gain visit url january n,\n", 180 | "if your girl popping, why not show her off , niggas be sleep on they own girl smh\n", 181 | "happy new year looking forward to new blessings, lessons, people, places, amp overall journeys!\n", 182 | "happy new year twitter fam!!\n", 183 | "idk why legit everyone tweeting don't drive drunk tn like don't drive drunk any night\n", 184 | "this is the effect has on my life url\n", 185 | "so my friend sent me a message and i'm emotional right now, she's so precious\n", 186 | "i wish .\n", 187 | "happy new year! god bless\n", 188 | "we can't wait to see you at the theatres this diwali!!! book your tickets now on url\n", 189 | "january february march april may june july august september october november december if you know it's been god\n", 190 | "papa, please follow me, for my year to sta happy, i know you can!!!! eu te amo\n", 191 | "happy new year!!\n", 192 | "youre welcome ily\n", 193 | "i wanna go home and go to sleep\n", 194 | "i have butterfly's\n", 195 | "wish i was packing for passion conf.\n", 196 | "this me exactly url\n", 197 | "lmfao. buyeke babes url\n", 198 | "happy birthday u stud\n", 199 | "wonder how many more drunk calls im gonna get\n", 200 | "bashers slowly by slowly makin' my day\n", 201 | "i always use . going to buy the himalaya men skin care range so that men in the house also have healthy skin.\n", 202 | "happy new year friends url\n", 203 | "don't forget to wrap it up, twice.!! yes, i went there. do you, but be safe when you do you, key .\n", 204 | "this stray dog just jumped in the at and won't get out !\n", 205 | "happy new year\n", 206 | "rrvote ariana grande lady gaga sia\n", 207 | "rrvote ariana grande lady gaga sia\n", 208 | "preach!!!\n", 209 | "y'all my ex a real platypus... she not a duck, she a platypus.\n", 210 | "i'm probably in yo city\n", 211 | "thank you sis\n", 212 | "it's finally n url\n", 213 | "forever url\n", 214 | "thanks friend url\n", 215 | "well n sure just started out fantastic....\n", 216 | "tbh n wasn't a bad year for me i've accomplished a lot so hopefully i'll continue on this path for n\n", 217 | "i'm not single i'm in a relationship with freedom.\n", 218 | "all i see are is asses shaking on my snapchat so i'm off that for the night.\n", 219 | "in sya allah. gcp will try to get more starting now!!! kuantan, pahang url\n", 220 | "n off each of first n uber rides. use code ubertheatrix to sign up. details url\n", 221 | "me right now to n url\n", 222 | "no but fr fr\n", 223 | "love you\n", 224 | "nope. i know he likes telling stories\n", 225 | "happy new year!\n", 226 | "compassion is free. respect is earned.\n", 227 | "i really need to make some life goals bc i'm lonely. please reply!!!? emma. musically url\n", 228 | "thank god for my friends that even though i barely see them, they're always there\n", 229 | "happy new year n!\n", 230 | "this was not the plan , but hello n\n", 231 | "n... was the most happiest,depressing,and adventurous year for me but another day is a good day if you keep moving forward\n", 232 | "incredibly wonderful! url\n", 233 | "i must've skipped being n url\n", 234 | "happy new year\n", 235 | "n has been filled with so many adventures and blessings for the magcon family! can't wait to see where n takes us!\n", 236 | "happy new year\n", 237 | "yall i'm so ready for graduation! may nnd where are you?\n", 238 | "i texted my n friends and i'm done\n", 239 | "home is where the heart is. i'm so grateful to have had the opportunity to come out to vegas url\n", 240 | "happy new year!! n n n n senior prom, birthday, graduation, college . this is the year i've been waiting for\n", 241 | "instead of trying to learn, i sit on my shitty little throne of sexism and islamophobia\n", 242 | "oakland stay lit\n", 243 | "you know it bitch\n", 244 | "this makes me so happy! url\n", 245 | "u hv proved again and again that you are tom! darpok!\n", 246 | "you're a wonderful person and have an amazing talent, dude. dont let them get to you. funny video tho\n", 247 | "n month psn giveaway to a random commenter on my new video rare url winner select\n", 248 | "i hope this year is as lit as the last. here's to n url\n", 249 | "if you bringing in new years broke get a fucking job n\n", 250 | "i'm really not gonna deal with this this year\n", 251 | "rt if n is gonna be ur year\n", 252 | "no waves, no surfboard? no problem mangrove beach resort, subic bay url\n", 253 | "ppvote ariana grande lady gaga sia\n", 254 | "ppvote ariana grande lady gaga sia\n", 255 | "ppvote ariana grande lady gaga sia\n", 256 | "n !!!.....and goodnight\n", 257 | "so i walk into the room to find my cat sitting on the bed watching tv by himself url\n", 258 | "here gt trump chick url\n", 259 | "class of n our year is coming up\n", 260 | "she hid under the bed to test her relationship. what her boyfriend did had her floored url\n", 261 | "brought in the new year with my best friend and our biggest fan rowan university url\n", 262 | "i just wanna go home\n", 263 | "lmao i love you\n", 264 | "happy new year mandy!\n", 265 | "new years follow spree like for a follow\n", 266 | "i cannot wait!! i've been counting down the days\n", 267 | "they will release it tomorrow i guess, since today is sunday.\n", 268 | "hey louis happy new year. for you n\n", 269 | "availll unn. yukk join sinbee\n", 270 | "welcome n\n", 271 | "hais look at this cutie started off my year right url\n", 272 | "fishy fishy\n", 273 | "you forgot to put alpha in quotes.\n", 274 | "if music s too loud, you re too old.\n", 275 | "what???? url\n", 276 | "happy new year joey bear!!!\n", 277 | "n more hour until new years west coast time\n", 278 | "i'm getting worst.\n", 279 | "haha reading this while sipping coffee frm trump tank mug gt url trump won thri\n", 280 | "hmm. leaga it's already n, how's third movie?\n", 281 | "happy nth birthday canada!!\n", 282 | "happy new year all! thank you for all the memories this n let's create more wonderful memories in n\n", 283 | "happy new year then url\n", 284 | "have to give thanks to togo for this first meal of n don't know where i'd be without such food.\n", 285 | "thanks .\n", 286 | "dvote ariana grande lady gaga sia\n", 287 | "class of n our year is coming up\n", 288 | "url aerospace amp leisure portal discover our aerospace and leisure world url\n", 289 | "happy new year !!!! url\n", 290 | "i love my friends that made n good\n", 291 | "n more fuckin hour !\n", 292 | "good morning and happy new year guys\n", 293 | "timezones are literally so weird. it's almost n in the blue states but n in the red states url\n", 294 | "candy url\n", 295 | "ikr but there's worse questions\n", 296 | "precious nyo senior super darling boy is nyo, n lbs, url mx he's looking for his a url\n", 297 | "us n n ukraine style womens autumn sexy url url\n", 298 | "i've got a pocket full of sunshine url\n", 299 | "someone take batteries out url\n", 300 | "it's refreshing to see the transformation of a blind follower\n", 301 | "with carla url\n", 302 | "body count reset activated. i am a virgin again\n", 303 | "just watched the mariah carey performance. so glad it's still n in my timezone cause i ain't wanna start the year off\n", 304 | "shit happens have a happy and healthy new year everybody! here's to making more headlines in n url\n", 305 | "so many people changing in a few hours can't wait to meet y'all\n", 306 | "shake hands... or not. url\n", 307 | "all me to death happy new year url\n", 308 | "he just made my heart melt url\n", 309 | "couldn't even make it a whole n minutes into the damn new year\n", 310 | "ah shit my head pounding\n", 311 | "happy new years blessed to see another one\n", 312 | "forgive em url\n", 313 | "happy new year everyone!!! be safe out there!\n", 314 | "lmao my dawg, i love you bra url\n", 315 | "xiami made a yixing inspired star train because the digital sales of yixing on xiami already surpassed nm sexcybaek\n", 316 | "this show so dumb url\n", 317 | "fate!\n", 318 | "excited for n\n", 319 | "how do you choose between two rihanna's if she was a twin url\n", 320 | "this is\n", 321 | "happy new year everyone! n\n", 322 | "i ain't eat since last year\n", 323 | "happy new years to all my friends amp followers url\n", 324 | "n kpn is waiting for you\n", 325 | "the fam is jamming hard to music\n" 326 | ] 327 | } 328 | ], 329 | "source": [ 330 | "with open('data/extracted.list') as fin:\n", 331 | " for _ in range(200):\n", 332 | " t = json.loads(fin.readline())\n", 333 | " print(preprocess(t['text']))" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [ 342 | "len(samples['😂'])" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 23, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "name": "stdout", 352 | "output_type": "stream", 353 | "text": [ 354 | "RT @AllyBrooke: HAPPY NEW YEAR!! 🎆🎈 🎉🥂 https://t.co/Nk5xeQAO5t\n", 355 | "RT @Whoadieeupnext: 💁🏽🙄 like.... https://t.co/t81GzcvfZ7\n", 356 | "FINALLY 2017 !!!!! Happy New Year from Mountain Standard Time !!!!!!!!!! 🎉🎉🎉\n", 357 | "When you say you're just gonna take a sip of wine and suddenly you're on your 4th glass 🍷\n", 358 | "KING OF POPSPLIT 👑\n", 359 | "https://t.co/HdbX7luOUb\n", 360 | "#agario #FUNNY #TURKEY #europe #game #trolling #POKEMONGO #NEW… https://t.co/VNEx4CXcFM\n", 361 | "HAPPY NEW YEARS!!! 💞✨\n", 362 | "RT @seIfcritics: Happy New Years ✨💕 https://t.co/K0o0hACXdj\n", 363 | "RT @BrandonGottFans: Happy New Year🌎❤️\n", 364 | "RT @IntolerantMano2: Friends....\n", 365 | "On a new day of 2017 I wnts to say,\n", 366 | "\n", 367 | "Open Your Blind Eyes Sickular Hindu\n", 368 | "Really they r in trouble😏 https:/…\n", 369 | "RT @mrblackcat1069: Lionel Richie \n", 370 | "All Night Long 🔥🔥🔥\n", 371 | "#Hello2017\n", 372 | "#Inspiration \n", 373 | "#Motivation \n", 374 | "#MrBlackCat1069 \n", 375 | "#NewDay… https://t.co/tbrNxL9Y…\n", 376 | "orbitCounter++\n", 377 | "\n", 378 | "🎉\n", 379 | "RT @TantrumDealo: i want your attention, but I won't beg for it 💁\n", 380 | "Henlo 2015😎\n", 381 | "RT @clariecazares18: @Robin_nicole71 & I are so mean but we don't give a fuck 😂💀🤢\n", 382 | "RT @cutie_peaz: She Hid Under The Bed To Test Her Relationship. What Her Boyfriend Did Had Her Floored\n", 383 | "😊😊😍😍😚😚😈😈\n", 384 | "https://t.co/QkGwClRWpA\n", 385 | "Can't figure out if it's the end or beginning 🐢\n", 386 | "RT @TTLYTEALA: Happy New Years from the savage 🎉🎊🎀 https://t.co/3kaCZb7c3S https://t.co/1olzFcNFW0\n", 387 | "RT @Danisource: [INSTAGRAM] cookiemonster3783 - With @shannonpix13 ♥♥ http://t.co/ck8MvyKz09 http://t.co/kB5G8pqutl\n", 388 | "RT @noonieenuu: Happy New Years. be safe & stay blessed ❤️❤️❤️❤️❤️\n", 389 | "RT @22goraw: highly blessed to see another year👼🏽💔\n", 390 | "If I said I was at CHURCH, don't ask me what my religion is!🙄\n", 391 | "RT @BindiIrwin: Wishing our beautiful little wombat, Wattle, a very happy 1st birthday ❤️ @AustraliaZoo https://t.co/FtkLr8WG5f\n", 392 | "RT @hamkamaulana83: oiya, happy new year 2017 😅 #ClashRoyal [pic] — https://t.co/817LeLBWd5\n", 393 | "RT @samantha__m14: hope 2017 gives me someone loyal😅\n", 394 | "\"Magic comes from what is inside you.\" - Jim Butcher 🌟 #alfiewhattam #alfiegwhattam #magic #magician #magictrick #illusion #quote #inspire\n", 395 | "fuck it up then babe ✨😜 https://t.co/UJxMZvW02q\n", 396 | "NEW YEAR, NEW SOCKS :) \n", 397 | "All styles available at https://t.co/ux5pAMjipl 😜 💩\n", 398 | "Join the Sock Game https://t.co/uFEvVd5kOA\n", 399 | "Best thing about 2016 😂😂😭@manuelabands https://t.co/09gSXjoyam\n", 400 | "RT @VibesWithBabe: When you and babe only bout each other 😍💕💕 https://t.co/FzStBZPqBo\n", 401 | "Ready🖤. https://t.co/9vvSoh5PDa\n", 402 | "RT @ItsCortnieee: Awe I love you too 💗💗 https://t.co/9IPPFgM4iQ\n", 403 | "RT @GurtTheGreat: All I wanted was a New Years kiss 😭\n", 404 | "RT @VictoriaAliyahh: So who gon wife me?😋🤗 https://t.co/pbQPvpNLkD\n", 405 | "My lil brother arrived at el a few hours ago and he wants to go back home already. Udikiwe yiMonti 😂 https://t.co/MuuoWXJs9U\n", 406 | "RT @mimie4ever: @therealdae_ u clever son😂😂😂😂\n", 407 | "RT @HarmonyWithTini: @AllyBrooke ARE YOU A TOWER, BECAUSE EIFFEL FOR YOU. 🌹\n", 408 | "I REALLY WANNA REPORT THIS ACC SO I CAN HAVE MY USERNAME 😩😩 https://t.co/w4yJslVGUN\n", 409 | "@Adam_Gartz thanks lil bro 💦\n", 410 | "I don't need no friends solo mis 12 uvas 🍇 #NewYears\n", 411 | "QQVOTE\n", 412 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 413 | "#NOW2016 #NOWArianaGrande\n", 414 | "RT @GawdTrill: Rt if you at home tonight🙋🏽‍♂️\n", 415 | "RT @bishnoikuldeep: The art of knowing is knowing to \"IGNORE\". Good morning 🙏\n", 416 | "Danny says he spent over 💯 😆😂🤣☺️😆😂\n", 417 | "RT @MisterPreda: HAPPY NEW YEAR! 🍾🥂 2017 IS ABOUT TO BE SHOOK! 🎉 https://t.co/YoHu6hoq4z\n", 418 | "🎉 Customize your own love locket with me! Any style + get 10% off\n", 419 | "🎉 To Customize a love locket… https://t.co/JuFoyn5mHA\n", 420 | "De cabeza 🎉😂🍻 Happy New Year 2017🎆🍸 @ Holiday Inn Matamoros https://t.co/Iwm520RTCT\n", 421 | "RT @rranae_: happy new years😆🎉\n", 422 | "@l8dm2016 and I am happy and thankful that we have those long talks. I'm also humbled by your kind words. 🙏🏻\n", 423 | "RT @___xM_G_W_Vx___: IF\n", 424 | "\n", 425 | "YOU\n", 426 | "\n", 427 | "WANT\n", 428 | "\n", 429 | "FOLLOWERS\n", 430 | "\n", 431 | "➡️#MGWV\n", 432 | "\n", 433 | "➡️#RETWEET\n", 434 | "\n", 435 | "➡️#FOLLOWTRICK\n", 436 | "\n", 437 | "➡️#TEAMFOLLOWBACK\n", 438 | "\n", 439 | "➡️#ANOTHERFOLLOWTRAIN\n", 440 | "\n", 441 | "➡️#FOLLOW ☞ @…\n", 442 | "RT @essence_imani: day 1 of 365 thank you God for allowing me to see this day 🙏🏽🙌🏽\n", 443 | "Tim is sleep & I wanted to fall asleep on the phone 😩\n", 444 | "HAPPY NEW YEAR here's to many more amazing memories 🎉🌟💜\n", 445 | "RT @atbfinancial: Welcome 2017!\n", 446 | "And Happy #Canada150\n", 447 | "🇨🇦🎉 https://t.co/iGisGXCsNr\n", 448 | "#Retweet Gain #Followers Visit https://t.co/XYPypRBF0k ✔#TakenBySurprise ✔#TeamFollowBack ✔#90sBabyFollowTrain ✔#Rt✔#FF✔#TFBJP January 01,…\n", 449 | "RT @DayyGotti: If your girl popping, why not show her off 😍😍 , niggas be sleep on they own girl smh\n", 450 | "Happy New Year 💓 Looking forward to new blessings, lessons, people, places, & overall journeys!\n", 451 | "happy new year twitter fam!! 🍾🍾\n", 452 | "RT @Joshcabral7: Idk why legit everyone tweeting don't drive drunk tn like don't drive drunk any night 😂😂😂\n", 453 | "RT @avi_ously_ptx: This is the effect @mitchgrassi has on my life ❤ https://t.co/fQ84Tbr1vo\n", 454 | "RT @bieberxstrings: So my friend sent me a message and I'm emotional right now, she's so precious 🙌🏻😩\n", 455 | "@Memoxa_ I wish 😞😞💔.\n", 456 | "RT @BrielleZolciak: happy new year! God Bless💙\n", 457 | "RT @AnushkaSharma: We can't wait to see you at the theatres this Diwali!!! 😀 Book your tickets now on https://t.co/VrIT9YaiUw… \n", 458 | "RT @omoissy: 👉January\n", 459 | "👉February\n", 460 | "👉March\n", 461 | "👉April\n", 462 | "👉May\n", 463 | "👉June\n", 464 | "👉July\n", 465 | "👉August\n", 466 | "👉September\n", 467 | "👉October\n", 468 | "👉November\n", 469 | "👉December\n", 470 | "RT if you know it's been God\n", 471 | "@WeLuvAllyB PAPA, PLEASE FOLLOW ME, FOR MY YEAR TO START HAPPY, I KNOW YOU CAN!!!! EU TE AMO 💜\n", 472 | "HAPPY NEW YEAR!!🥂🍾📸🎊🎉\n", 473 | "@sleepychaes YOURE WELCOME ILY 🍑\n", 474 | "I wanna go home and go to sleep 😞\n", 475 | "I have butterfly's 😭😍💟\n", 476 | "Wish I was packing for Passion Conf. 😭\n", 477 | "This me exactly 🙁 https://t.co/BlSUP39PL2\n", 478 | "😂😂😂😂 lmfao. Buyeke babes https://t.co/6mKvYDe7Ow\n", 479 | "RT @leocolon09: @glo_jasminexo happy birthday u stud 💯😜\n", 480 | "wonder how many more drunk calls im gonna get 😂\n", 481 | "bashers slowly by slowly makin' my day😂\n", 482 | "I always use @HimalayaIndia ❤. Going to buy the Himalaya men skin care range so that men in the house also have healthy skin. \n", 483 | "#SomethingNew\n", 484 | "RT @goonj_sobtian: Happy New Year Friends 😙😍💕🍷🍻🍸🍹🍫🍦 https://t.co/PAv7B7NdRM\n", 485 | "Don't forget to wrap it up, twice.!! 😬 yes, I went there. Do you, but be safe when you do you, key*.\n", 486 | "this stray dog just jumped in the at and won't get out ! 😂\n", 487 | "RT @KingAmiyahScott: Happy New Year 🎉\n", 488 | "RRVOTE\n", 489 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 490 | "#NOW2016 #NOWArianaGrande\n", 491 | "RRVOTE\n", 492 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 493 | "#NOW2016 #NOWArianaGrande\n", 494 | "@ashleymac98 preach!!!👌🙌\n", 495 | "y'all my ex a real platypus... \n", 496 | "\n", 497 | "she not a duck, she a platypus.👎🏽\n", 498 | "I'm probably in yo city😈\n", 499 | "@YandaNkohla thank you sis ♥️♥️♥️😘😘😘\n", 500 | "It's finally 2017 😂 https://t.co/6J8vEUOqgF\n", 501 | "RT @hxhchains: Forever ♡ https://t.co/ngM4S1aj1M\n", 502 | "Thanks Friend 💀❤ https://t.co/wOXH9Jyd77\n", 503 | "Well 2017 sure just started out fantastic.... 🙃\n", 504 | "Tbh 2016 wasn't a bad year for me I've accomplished a lot so hopefully I'll continue on this path for 2017😊😁💯\n", 505 | "I'm not single \n", 506 | "\n", 507 | "I'm in a relationship with freedom. 👑✨\n", 508 | "All I see are is asses shaking on my Snapchat so I'm off that for the night. 🙂\n", 509 | "In sya Allah. GCP will try to get more starting now!!! 😎😎😎 @ Kuantan, Pahang https://t.co/BVXf3h9gPw\n", 510 | "$10 off each of first 5 Uber rides. Use code ‘ubertheatrix’ to sign up. #NewYearsEve #saferide #free 🍷🍸🍷🍸Details: https://t.co/OJSG2rADzp\n", 511 | "RT @JusstLeah: Me right now to 2016 ✨ https://t.co/568pkgqxke\n", 512 | "@dimesavusa @nateezzy no but fr fr 😭💀😂\n", 513 | "@njhxhugs love you 💕\n", 514 | "@LimLim_ Nope. I Know He Likes Telling Stories 😂😂😂\n", 515 | "@willysherman12 Happy New Year! 🎉🍾\n", 516 | "RT @Ju_gxddess: Compassion is free. Respect is earned.👌🏽\n", 517 | "@jacobsartorius I really need to make some life goals bc I'm lonely.😂🦄❤️😘 Please reply!!!? •Emma. Musically… https://t.co/Xa2EBHQQ03\n", 518 | "Thank god for my friends that even though I barely see them, they're always there✨\n", 519 | "RT @isyanasarasvati: Happy New Year 2017!🌞\n", 520 | "This was not the plan 😂💀 , but hello 2017😭💀\n", 521 | "2016....\n", 522 | "Was the most happiest,depressing,and adventurous year for me but another day is a good day if you keep moving forward 👀🙏🏾💯#blessed\n", 523 | "RT @MaySongsBeWithU: Incredibly wonderful!\n", 524 | "🌿💃🎍💃🌿 https://t.co/vJ74byA8TV\n", 525 | "RT @Bebelleyy: I must've skipped being 15 😪 https://t.co/ip2BE1n2xH\n", 526 | "@THE_JEONGGUK HAPPY NEW YEAR 💛\n", 527 | "RT @MAGCONTOUR: 2016 has been filled with so many adventures and blessings for the MAGCON family! Can't wait to see where 2017 takes us! 🙌🏻\n", 528 | "💕❤️Happy New Year 💕🌟👯\n", 529 | "RT @thispeasantlife: Yall I'm so ready for graduation! May 22nd where are you? 🤗🎓\n", 530 | "RT @StraitThuggn: I texted my 6 friends and I'm done 😂😂😂😩\n", 531 | "home is where the heart is. ❤ I'm so grateful to have had the opportunity to come out to Vegas… https://t.co/WnooH9LSkX\n", 532 | "RT @miarixo_: Happy New Year!! 2️⃣0️⃣1️⃣7️⃣: Senior Prom, 🔞 Birthday, Graduation, College . This is the year I've been waiting for 😩🙌🏾\n", 533 | "RT @omobolajiolayio: @hiimtania @Goodnght3rdside instead of trying to learn, I sit on my shitty little throne of sexism and Islamophobia😊\n", 534 | "RT @AJAsoau: Oakland stay lit 👏\n", 535 | "@Neongh0sts you know it bitch 😩😩😩\n", 536 | "This makes me so happy! 💕 #EvasHolidayGiveaway https://t.co/QFUzB1QjuZ\n", 537 | "RT @bhaskar_k_: @My_Tweet_Says u hv proved again and again that YOU ARE TOM! Darpok! 😊😊😊\n", 538 | "@mani_1237 @geeta_phogat @BabitaPhogat\n", 539 | "@prozdkp You're a wonderful person and have an amazing talent, dude. Dont let them get to you. Funny video tho 👍\n", 540 | "RT @PinataBoomBoom: 3 Month PSN giveaway 🎁 to a random commenter ⌨️ on my new video \" RARE \" https://t.co/JHgwSnpMdH ❤️ \n", 541 | "\n", 542 | "WINNER select… \n", 543 | "I hope this year is as lit as the last. Here's to 2017 🎇 https://t.co/PGEUCh36w6\n", 544 | "RT @smokecamp_chino: If you bringing in New Years broke get a fucking job 2017 😎\n", 545 | "I'm really not gonna deal with this this year 🙄\n", 546 | "RT @snowgloww: rt if 2017 is gonna be ur year 😛😛\n", 547 | "No waves, no surfboard? No problem 👍🏼 @ Mangrove Beach Resort, Subic Bay https://t.co/UfolToQQ6i\n", 548 | "PPVOTE\n", 549 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 550 | "#NOW2016 #NOWArianaGrande\n", 551 | "PPVOTE\n", 552 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 553 | "#NOW2016 #NOWArianaGrande\n", 554 | "PPVOTE\n", 555 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 556 | "#NOW2016 #NOWArianaGrande\n", 557 | "RT @i_love_to_hoop: 2017 !!!.....and goodnight 💤\n", 558 | "RT @7_3Powestroke: So I walk into the room to find my cat sitting on the bed watching tv by himself😂😂 https://t.co/X5g0q8Dcwk\n", 559 | "RT @JustMy_NameHere: @AwezomeSwe #800 here--> ☆TRUMP☆CHICK☆\n", 560 | "#HAPPYNEWYEAR https://t.co/byivbtPk8O\n", 561 | "RT @Joshuacole__: Class of 2018 our year is coming up😭💪🏾🎓🎓🎓🎓🎓\n", 562 | "RT @Doubleceee: She Hid Under The Bed To Test Her Relationship. What Her Boyfriend Did Had Her Floored\n", 563 | "😊😊😍😍😚😚😈😈\n", 564 | "https://t.co/D5dJ7gC8pl\n", 565 | "Brought in the new year with my best friend and our biggest fan😘 @ Rowan University https://t.co/6pgRH4DWeZ\n", 566 | "I just wanna go home 🙂\n", 567 | "@Project_Ander lmao I love you 😂❤🎊\n", 568 | "RT @BleedblueTammy: @Mandypants_34 Happy New Year Mandy! 🎉🍻\n", 569 | "RT @CelestiaVega: NEW YEARS FOLLOW SPREE 🎉\n", 570 | "\n", 571 | "LIKE + RT FOR A FOLLOW\n", 572 | "@cspacat I CANNOT WAIT!! I've been counting down the days 😍\n", 573 | "@petiteyoona They will release it tomorrow I guess, since today is Sunday.☺\n", 574 | "@Louis_Tomlinson Hey louis Happy new year. For you 2017 💋💋💋💋💋💋💋❤❤❤❤❤❤❤😻💋💋💋❤❤❤❤❤\n", 575 | "@haeunrps availll unn. Yukk join😉 -sinbee\n", 576 | "🎉 Welcome 2017 🎉\n", 577 | "hais look at this cutie started off my year right 😄 https://t.co/kdAe4QMG9S\n", 578 | "Fishy fishy 😴😴\n", 579 | "@PolitikMasFina - you forgot to put \"Alpha\" in quotes. 😂\n", 580 | "If music’s too loud, you’re too old. 🔉\n", 581 | "WHAT???? 😂😂😂😂😂 https://t.co/suXI8n3Pv7\n", 582 | "@eggeuscramble happy new year joey bear!!! 💗\n", 583 | "1 more hour until New Years 💋 west coast time\n", 584 | "I'm getting worst. 😞\n", 585 | "RT @Jehsouzars: @JordanUhl 💥💥HAHA reading this while sipping coffee frm TRUMP TANK mug 😉> https://t.co/xsKXMVXLWp Trump WON THRI… \n", 586 | "Hmm. LeAga it's already 2017, how's third movie? 😭😭\n", 587 | "Happy 150th birthday Canada!! 😊😊😊😊😊😊😊😊\n", 588 | "RT @Kpop_Polling: Happy New Year all! Thank you for all the memories this 2016. Let's create more wonderful memories in 2017 🙌🎊🎉🎇🎆\n", 589 | "RT @marioesus: HAPPY NEW YEAR THEN 💖 https://t.co/0TckHUSFUZ\n", 590 | "Have to give thanks to Togo for this first meal of 2017. Don't know where I'd be without such food. 🙌🏾\n", 591 | "@liljaay___ thanks🖤.\n", 592 | "DVOTE\n", 593 | "#VideoMTV2016 Ariana Grande 💎 Lady Gaga 💎 Sia \n", 594 | "#NOW2016 #NOWArianaGrande\n", 595 | "RT @Joshuacole__: Class of 2018 our year is coming up😭💪🏾🎓🎓🎓🎓🎓\n", 596 | "https://t.co/pSzvrbJViu ✈ AEROSPACE & LEISURE PORTAL ✈ Discover our Aerospace and Leisure World ✈ https://t.co/nV9tSMRa3g\n", 597 | "Happy new year !!!! 🎉🎉🎉🎉🎁🎁🎁🎈🎈 https://t.co/ZUSjIJrPPD\n", 598 | "RT @marisaanicolee7: I love my friends that made 2016 good 💖💞💛\n", 599 | "1 more fuckin hour ! 😤😤😤\n", 600 | "Good morning and Happy New Year guys 💖💖\n", 601 | "RT @fxcknicolas: timezones are literally so weird. it's almost 2017 in the blue states but 1924 in the red states 😩 https://t.co/kxag2UM53F\n", 602 | "RT @spanishcvndy: candy 🍭 https://t.co/yyDCJ8UECT\n", 603 | "@FCMessiolona ikr but there's worse questions 🌚\n", 604 | "🚨🆘PrEcIoUs 14yo SENIOR SUPER #URGENT \n", 605 | "🚨Darling boy is 14yo, 35 lbs, GSD/Bgl mx\n", 606 | "🚨He's looking for his #foreverhome A… https://t.co/zixbFPbQDF\n", 607 | "⚡ US $8.45\n", 608 | "2016 Ukraine Style Womens Autumn Sexy #fall #maxi #dresses #ukrainedress\n", 609 | "https://t.co/dUFeSeOXM1 https://t.co/RrBPnQL9Ob\n", 610 | "RT @GlamourPosts: I've got a pocket full of sunshine 😍❤️💕 https://t.co/TEQytzs4fT\n", 611 | "Someone take @raven_kiara batteries OUT😂😂 https://t.co/19zwEbZPAl\n", 612 | "RT @iRuchi_1: @madhukishwar it's refreshing to see the transformation of a blind follower 😊\n", 613 | "with/ Carla😁 https://t.co/VraQZxwang\n", 614 | "RT @baefromtexas: body count reset activated. i am a virgin again 🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾\n", 615 | "RT @FunnyMaine: Just watched the Mariah Carey performance. 🙃 So glad it's still 2016 in my timezone cause I ain't wanna start the year off…\n", 616 | "RT @MariahCarey: Shit happens 😩 Have a happy and healthy new year everybody!🎉 Here's to making more headlines in 2017 😂 https://t.co/0Td8se…\n", 617 | "RT @babyjj__: So many people changing in a few hours can't wait to meet y'all 😂😂😂💀\n", 618 | "RT @SportsCenter: Shake hands... or not. 😳 https://t.co/lcSSphNvo4\n", 619 | "All me@love@me to death😍😍😍😍😘😘😘😘❤️❤️❤️ happy new year https://t.co/59G5UbKyxK\n", 620 | "RT @ohwessicaann: He just made my heart melt 😭❤️ https://t.co/X0X8B57ZSy\n", 621 | "couldn't even make it a whole 20 minutes into the damn New Year🙄\n", 622 | "Ah shit my head pounding 🤕😂\n", 623 | "RT @rkwilli35: Happy New Years🎊 blessed to see another one 🙏🏽\n", 624 | "RT @ardey_law: 😂😂😂forgive em https://t.co/SzZp2Qm9dW\n", 625 | "RT @MapsMaponyane: Happy New Year Everyone!!!🎉🎊 Be Safe Out There! 🙏 #2017\n", 626 | "RT @lilblacklion: Lmao my dawg, I love you bra 😂❤ https://t.co/V40q7PkKGK\n", 627 | "XIAMI made a Yixing-inspired \"Star Train\" because the digital sales of Yixing on Xiami already surpassed 1M🎉\n", 628 | "(sexcybaek)\n", 629 | "RT @cartoonmoji: This show so dumb 😂😂💪🏽 #CartoonGeek https://t.co/nYuHdrrdnF\n", 630 | "@dullemarulle Fate! ♡\n", 631 | "RT @TTLYTEALA: Excited for 2017 🎉🎊🎉\n", 632 | "RT @BlackPplVines: How do you choose between two Rihanna's if she was a twin 😂😂https://t.co/Je5D6AuLrj\n", 633 | "RT @GSDinSeattle: This is #SoTrue 🐺❤️🐾☺️#friendshipgoals #familytime #holidayswithheart #dogs #dogsarelove #soulmates #love… \n", 634 | "Happy new year everyone!🎉 2017\n", 635 | "RT @BriSteele_: I ain't eat since last year 😂😂\n", 636 | "RT @MrEdTrain: Happy New Years To all my friends & followers 🎉🍾🌹🇺🇸https://t.co/12hFYTQnRv\n", 637 | "2017 kp4 is waiting for you 👀 @katyperry #welcomekp4\n", 638 | "RT @dbmtz89: The fam is jamming hard to @xoxomariel_ music 😂\n" 639 | ] 640 | } 641 | ], 642 | "source": [ 643 | "with open('data/extracted.list') as fin:\n", 644 | " for _ in range(200):\n", 645 | " t = json.loads(fin.readline())\n", 646 | " print(t['text'])\n" 647 | ] 648 | } 649 | ], 650 | "metadata": { 651 | "kernelspec": { 652 | "display_name": "Python 3", 653 | "language": "python", 654 | "name": "python3" 655 | } 656 | }, 657 | "nbformat": 4, 658 | "nbformat_minor": 2 659 | } 660 | -------------------------------------------------------------------------------- /emoji_demo/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h4x3rotab/emoji-tf-ios/e84a0ec65f9020c720b685ba47ce9c7dae3d21d9/emoji_demo/.DS_Store -------------------------------------------------------------------------------- /emoji_demo/AppDelegate.h: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #import 16 | 17 | @interface AppDelegate : UIResponder 18 | 19 | @property (strong, nonatomic) UIWindow *window; 20 | 21 | @end 22 | -------------------------------------------------------------------------------- /emoji_demo/AppDelegate.mm: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #import "AppDelegate.h" 16 | 17 | #import "RunModelViewController.h" 18 | 19 | @implementation AppDelegate 20 | 21 | - (BOOL)application:(UIApplication *)application 22 | didFinishLaunchingWithOptions:(NSDictionary *)launchOptions { 23 | 24 | UITabBarController *bar = [[UITabBarController alloc] init]; 25 | [bar setViewControllers: 26 | @[[[RunModelViewController alloc] init]]]; 27 | bar.selectedIndex = 0; 28 | self.window = [[UIWindow alloc] initWithFrame:[[UIScreen mainScreen] bounds]]; 29 | self.window.rootViewController = bar; 30 | [self.window makeKeyAndVisible]; 31 | return YES; 32 | } 33 | 34 | - (void)applicationWillResignActive:(UIApplication *)application {} 35 | 36 | - (void)applicationDidEnterBackground:(UIApplication *)application {} 37 | 38 | - (void)applicationWillEnterForeground:(UIApplication *)application {} 39 | 40 | - (void)applicationDidBecomeActive:(UIApplication *)application {} 41 | 42 | - (void)applicationWillTerminate:(UIApplication *)application {} 43 | 44 | @end 45 | -------------------------------------------------------------------------------- /emoji_demo/RunModel-Info.plist: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | CFBundleDevelopmentRegion 6 | en 7 | CFBundleDisplayName 8 | tf_ios_makefile_example 9 | CFBundleExecutable 10 | tf_ios_makefile_example 11 | CFBundleIdentifier 12 | Google.RunModel 13 | CFBundleInfoDictionaryVersion 14 | 6.0 15 | CFBundleName 16 | ios-app 17 | CFBundlePackageType 18 | APPL 19 | CFBundleShortVersionString 20 | 1.0 21 | CFBundleSignature 22 | ???? 23 | CFBundleVersion 24 | 1.0 25 | LSRequiresIPhoneOS 26 | 27 | UILaunchStoryboardName 28 | RunModelViewController 29 | UIRequiredDeviceCapabilities 30 | 31 | armv7 32 | 33 | UISupportedInterfaceOrientations 34 | 35 | UIInterfaceOrientationPortrait 36 | UIInterfaceOrientationLandscapeLeft 37 | UIInterfaceOrientationLandscapeRight 38 | 39 | UISupportedInterfaceOrientations~ipad 40 | 41 | UIInterfaceOrientationPortrait 42 | UIInterfaceOrientationPortraitUpsideDown 43 | UIInterfaceOrientationLandscapeLeft 44 | UIInterfaceOrientationLandscapeRight 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /emoji_demo/RunModelViewController.h: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #import 16 | 17 | @interface RunModelViewController : UIViewController 18 | 19 | - (void) viewDidLoad; 20 | - (IBAction) inference:(id) sender; 21 | 22 | @property (strong, nonatomic) NSDate *lastFire; 23 | @property (weak, nonatomic) IBOutlet UITextField *inputSentField; 24 | @property (weak, nonatomic) IBOutlet UITextView *urlContentTextView; 25 | 26 | @end 27 | -------------------------------------------------------------------------------- /emoji_demo/RunModelViewController.mm: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #import "RunModelViewController.h" 16 | 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | 24 | #include "google/protobuf/io/coded_stream.h" 25 | #include "google/protobuf/io/zero_copy_stream_impl.h" 26 | #include "google/protobuf/io/zero_copy_stream_impl_lite.h" 27 | #include "google/protobuf/message_lite.h" 28 | #include "tensorflow/core/framework/op_kernel.h" 29 | #include "tensorflow/core/framework/tensor.h" 30 | #include "tensorflow/core/framework/types.pb.h" 31 | #include "tensorflow/core/platform/env.h" 32 | #include "tensorflow/core/platform/logging.h" 33 | #include "tensorflow/core/platform/mutex.h" 34 | #include "tensorflow/core/platform/types.h" 35 | #include "tensorflow/core/public/session.h" 36 | 37 | #include "ios_image_load.h" 38 | 39 | static bool initlized = false; 40 | void InitModel(); 41 | std::vector> RunInference(const std::string& sent); 42 | 43 | namespace { 44 | class IfstreamInputStream : public ::google::protobuf::io::CopyingInputStream { 45 | public: 46 | explicit IfstreamInputStream(const std::string& file_name) 47 | : ifs_(file_name.c_str(), std::ios::in | std::ios::binary) {} 48 | ~IfstreamInputStream() { ifs_.close(); } 49 | 50 | int Read(void* buffer, int size) { 51 | if (!ifs_) { 52 | return -1; 53 | } 54 | ifs_.read(static_cast(buffer), size); 55 | return (int)ifs_.gcount(); 56 | } 57 | 58 | private: 59 | std::ifstream ifs_; 60 | }; 61 | } // namespace 62 | 63 | @interface RunModelViewController () 64 | @end 65 | 66 | @implementation RunModelViewController { 67 | } 68 | 69 | - (void) viewDidLoad { 70 | if (!initlized) { 71 | initlized = true; 72 | InitModel(); 73 | } 74 | } 75 | 76 | - (IBAction) inference:(id) sender { 77 | std::string sent([self.inputSentField.text UTF8String]); 78 | if (sent.empty()) { 79 | return; 80 | } 81 | auto predict = RunInference(sent); 82 | NSMutableString* inference_result = [NSMutableString string]; 83 | for (const auto& tuple : predict) { 84 | const std::string& emoji = tuple.first; 85 | const float score = tuple.second; 86 | [inference_result appendFormat:@"%@: %.4f\n", [NSString stringWithUTF8String: emoji.c_str()], score]; 87 | } 88 | self.urlContentTextView.text = inference_result; 89 | } 90 | 91 | - (IBAction) textEdited:(id)sender { 92 | NSDate* now = [NSDate date]; 93 | if (self.lastFire == nil || [now timeIntervalSinceDate:self.lastFire] >= 0.2) { 94 | [self inference: nil]; 95 | self.lastFire = now; 96 | } 97 | } 98 | 99 | @end 100 | 101 | // Returns the top N confidence values over threshold in the provided vector, 102 | // sorted by confidence in descending order. 103 | static void GetTopN( 104 | const Eigen::TensorMap, 105 | Eigen::Aligned>& prediction, 106 | const int num_results, const float threshold, 107 | std::vector >* top_results) { 108 | // Will contain top N results in ascending order. 109 | std::priority_queue, 110 | std::vector >, 111 | std::greater > > top_result_pq; 112 | 113 | const int count = (int)prediction.size(); 114 | for (int i = 0; i < count; ++i) { 115 | const float value = prediction(i); 116 | 117 | // Only add it if it beats the threshold and has a chance at being in 118 | // the top N. 119 | if (value < threshold) { 120 | continue; 121 | } 122 | 123 | top_result_pq.push(std::pair(value, i)); 124 | 125 | // If at capacity, kick the smallest value out. 126 | if (top_result_pq.size() > num_results) { 127 | top_result_pq.pop(); 128 | } 129 | } 130 | 131 | // Copy to output vector and reverse into descending order. 132 | while (!top_result_pq.empty()) { 133 | top_results->push_back(top_result_pq.top()); 134 | top_result_pq.pop(); 135 | } 136 | std::reverse(top_results->begin(), top_results->end()); 137 | } 138 | 139 | 140 | bool PortableReadFileToProto(const std::string& file_name, 141 | ::google::protobuf::MessageLite* proto) { 142 | ::google::protobuf::io::CopyingInputStreamAdaptor stream( 143 | new IfstreamInputStream(file_name)); 144 | stream.SetOwnsCopyingStream(true); 145 | // TODO(jiayq): the following coded stream is for debugging purposes to allow 146 | // one to parse arbitrarily large messages for MessageLite. One most likely 147 | // doesn't want to put protobufs larger than 64MB on Android, so we should 148 | // eventually remove this and quit loud when a large protobuf is passed in. 149 | ::google::protobuf::io::CodedInputStream coded_stream(&stream); 150 | // Total bytes hard limit / warning limit are set to 1GB and 512MB 151 | // respectively. 152 | coded_stream.SetTotalBytesLimit(1024LL << 20, 512LL << 20); 153 | return proto->ParseFromCodedStream(&coded_stream); 154 | } 155 | 156 | NSString* FilePathForResourceName(NSString* name, NSString* extension) { 157 | NSString* file_path = [[NSBundle mainBundle] pathForResource:name ofType:extension]; 158 | if (file_path == NULL) { 159 | LOG(FATAL) << "Couldn't find '" << [name UTF8String] << "." 160 | << [extension UTF8String] << "' in bundle."; 161 | } 162 | return file_path; 163 | } 164 | 165 | tensorflow::Session* session_pointer = nullptr; 166 | std::unique_ptr session; 167 | 168 | std::vector label_strings = { 169 | "🎉", "🎈", "🙊", "🙄", "👑", "✨", "💞", "💕", "❤", "😏", "🔥", "😎", "💀", 170 | "😂", "😍", "😊", "😈", "❤️", "💔", "😅", "🌟", "😜", "😭", "💗", "😋", "🌹", 171 | "😩", "💦", "♂", "🙏", "☺", "💯", "😆", "➡️", "🙌", "💜", "✔", "💓", "💙", 172 | "😀", "👉", "😬", "👌", "😘", "♡", "🙃", "😁", "🙂", "👀", "💃", "💛", "👏", 173 | "👍", "😛", "💪", "💋", "😻", "😉", "😄", "😴", "💥", "💖", "😤", "🚨", "⚡", 174 | "😳", "🎶", "🗣", "👅", "😫", "✌", "💚", "🙈", "😇", "😒", "😌", "❗", "😢", 175 | "😕", "👊", "🌙", "👇", "😔", "❄", "💘", "✊", "💫", "😡", "♀", "🏆", "🌸", 176 | "★", "😱", "📷", "💰", "⚽", "🐐", "✅" 177 | }; 178 | 179 | void InitModel() { 180 | tensorflow::SessionOptions options; 181 | options.config.mutable_graph_options() 182 | ->mutable_optimizer_options() 183 | ->set_opt_level(tensorflow::OptimizerOptions::L0); 184 | options.config.set_inter_op_parallelism_threads(1); 185 | options.config.set_intra_op_parallelism_threads(1); 186 | tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer); 187 | 188 | if (!session_status.ok()) { 189 | std::string status_string = session_status.ToString(); 190 | NSLog(@"Session create failed - %s", status_string.c_str()); 191 | return; 192 | } 193 | session.reset(session_pointer); 194 | LOG(INFO) << "Session created."; 195 | 196 | tensorflow::GraphDef tensorflow_graph; 197 | LOG(INFO) << "Graph created."; 198 | 199 | NSString* network_path = FilePathForResourceName(@"emoji_frozen", @"pb"); // tensorflow_inception_graph 200 | PortableReadFileToProto([network_path UTF8String], &tensorflow_graph); 201 | 202 | LOG(INFO) << "Creating session."; 203 | tensorflow::Status s = session->Create(tensorflow_graph); 204 | if (!s.ok()) { 205 | LOG(ERROR) << "Could not create TensorFlow Graph: " << s; 206 | return; 207 | } 208 | } 209 | 210 | // Generates feature sequence for a sentence. 211 | tensorflow::Tensor TextToInputSequence(const std::string& sent) { 212 | // Everything here should be consistent with the original Python code (tokenize_dataset.ipynb). 213 | // Magic alphabet and label_strings are come from. 214 | tensorflow::Tensor text_tensor(tensorflow::DT_INT32, tensorflow::TensorShape({1, 120})); 215 | auto tensor_mapped = text_tensor.tensor(); 216 | tensorflow::int32* data = tensor_mapped.data(); 217 | 218 | // num_alphabet = 36 # (3+33) 219 | // num_cat = 99 # (1+98) 220 | // T_PAD = 0 221 | // T_OOV = 2 222 | const int T_START = 1; 223 | 224 | // Build alphabet. 225 | std::string alphabet = "### eotainsrlhuydmgwcpfbk.v'!,jx?zq_"; 226 | std::map aidx; 227 | for (int i = 0; i < alphabet.length(); ++i) { 228 | aidx[alphabet[i]] = i; 229 | } 230 | // Generate seq. 231 | std::vector seq; 232 | seq.push_back(T_START); 233 | for (char ch : sent) { 234 | char lower_ch = tolower(ch); 235 | if (aidx.count(lower_ch) > 0) { 236 | seq.push_back(aidx[lower_ch]); 237 | } 238 | } 239 | // Trim and padding. 240 | const int MAX_LEN = 120; 241 | int seq_len = std::min(MAX_LEN, (int)seq.size()); 242 | memset(data, 0, MAX_LEN * sizeof(int)); 243 | memcpy(data + (MAX_LEN - seq_len), seq.data(), seq_len * sizeof(int)); 244 | 245 | return text_tensor; 246 | } 247 | 248 | std::vector> RunInference(const std::string& sent) { 249 | std::vector> inference_result; 250 | // Extract feature. 251 | auto text_tensor = TextToInputSequence(sent); 252 | // Inference. 253 | std::string input_layer = "input_1"; 254 | std::string output_layer = "dense_2/Softmax"; 255 | std::vector outputs; 256 | tensorflow::RunOptions options; 257 | tensorflow::RunMetadata metadata; 258 | tensorflow::Status run_status = session->Run( 259 | {{input_layer, text_tensor}}, {output_layer}, {}, &outputs); 260 | if (!run_status.ok()) { 261 | LOG(ERROR) << "Running model failed: " << run_status; 262 | tensorflow::LogAllRegisteredKernels(); 263 | return inference_result; 264 | } 265 | tensorflow::string status_string = run_status.ToString(); 266 | LOG(INFO) << "Run status: " << status_string; 267 | 268 | // Collect outputs. 269 | tensorflow::Tensor* output = &outputs[0]; 270 | const int kNumResults = 5; 271 | const float kThreshold = 0.005f; 272 | std::vector> top_results; 273 | GetTopN(output->flat(), kNumResults, kThreshold, &top_results); 274 | 275 | std::stringstream ss; 276 | ss.precision(3); 277 | for (const auto& result : top_results) { 278 | const float confidence = result.first; 279 | const int index = result.second; 280 | const std::string& label = label_strings[index]; 281 | ss << index << " " << confidence << " " << label << "\n"; 282 | inference_result.emplace_back(label, confidence); 283 | } 284 | LOG(INFO) << "Predictions: " << ss.str(); 285 | return inference_result; 286 | } 287 | -------------------------------------------------------------------------------- /emoji_demo/RunModelViewController.xib: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | -------------------------------------------------------------------------------- /emoji_demo/data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h4x3rotab/emoji-tf-ios/e84a0ec65f9020c720b685ba47ce9c7dae3d21d9/emoji_demo/data/.DS_Store -------------------------------------------------------------------------------- /emoji_demo/data/emoji_frozen.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h4x3rotab/emoji-tf-ios/e84a0ec65f9020c720b685ba47ce9c7dae3d21d9/emoji_demo/data/emoji_frozen.pb -------------------------------------------------------------------------------- /emoji_demo/ios_image_load.h: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #ifndef TENSORFLOW_EXAMPLES_IOS_IOS_IMAGE_LOAD_H_ 16 | #define TENSORFLOW_EXAMPLES_IOS_IOS_IMAGE_LOAD_H_ 17 | 18 | #include 19 | 20 | #include "tensorflow/core/framework/types.h" 21 | 22 | std::vector LoadImageFromFile(const char* file_name, 23 | int* out_width, 24 | int* out_height, 25 | int* out_channels); 26 | 27 | #endif // TENSORFLOW_EXAMPLES_IOS_IOS_IMAGE_LOAD_H_ 28 | -------------------------------------------------------------------------------- /emoji_demo/ios_image_load.mm: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #include "ios_image_load.h" 16 | 17 | #include 18 | #include 19 | #include 20 | #include 21 | 22 | #import 23 | #import 24 | 25 | using tensorflow::uint8; 26 | 27 | std::vector LoadImageFromFile(const char* file_name, 28 | int* out_width, int* out_height, 29 | int* out_channels) { 30 | FILE* file_handle = fopen(file_name, "rb"); 31 | fseek(file_handle, 0, SEEK_END); 32 | const size_t bytes_in_file = ftell(file_handle); 33 | fseek(file_handle, 0, SEEK_SET); 34 | std::vector file_data(bytes_in_file); 35 | fread(file_data.data(), 1, bytes_in_file, file_handle); 36 | fclose(file_handle); 37 | CFDataRef file_data_ref = CFDataCreateWithBytesNoCopy(NULL, file_data.data(), 38 | bytes_in_file, 39 | kCFAllocatorNull); 40 | CGDataProviderRef image_provider = 41 | CGDataProviderCreateWithCFData(file_data_ref); 42 | 43 | const char* suffix = strrchr(file_name, '.'); 44 | if (!suffix || suffix == file_name) { 45 | suffix = ""; 46 | } 47 | CGImageRef image; 48 | if (strcasecmp(suffix, ".png") == 0) { 49 | image = CGImageCreateWithPNGDataProvider(image_provider, NULL, true, 50 | kCGRenderingIntentDefault); 51 | } else if ((strcasecmp(suffix, ".jpg") == 0) || 52 | (strcasecmp(suffix, ".jpeg") == 0)) { 53 | image = CGImageCreateWithJPEGDataProvider(image_provider, NULL, true, 54 | kCGRenderingIntentDefault); 55 | } else { 56 | CFRelease(image_provider); 57 | CFRelease(file_data_ref); 58 | fprintf(stderr, "Unknown suffix for file '%s'\n", file_name); 59 | *out_width = 0; 60 | *out_height = 0; 61 | *out_channels = 0; 62 | return std::vector(); 63 | } 64 | 65 | const int width = (int)CGImageGetWidth(image); 66 | const int height = (int)CGImageGetHeight(image); 67 | const int channels = 4; 68 | CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB(); 69 | const int bytes_per_row = (width * channels); 70 | const int bytes_in_image = (bytes_per_row * height); 71 | std::vector result(bytes_in_image); 72 | const int bits_per_component = 8; 73 | CGContextRef context = CGBitmapContextCreate(result.data(), width, height, 74 | bits_per_component, bytes_per_row, color_space, 75 | kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big); 76 | CGColorSpaceRelease(color_space); 77 | CGContextDrawImage(context, CGRectMake(0, 0, width, height), image); 78 | CGContextRelease(context); 79 | CFRelease(image); 80 | CFRelease(image_provider); 81 | CFRelease(file_data_ref); 82 | 83 | *out_width = width; 84 | *out_height = height; 85 | *out_channels = channels; 86 | return result; 87 | } 88 | -------------------------------------------------------------------------------- /emoji_demo/main.mm: -------------------------------------------------------------------------------- 1 | // Copyright 2015 Google Inc. All rights reserved. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | // See the License for the specific language governing permissions and 13 | // limitations under the License. 14 | 15 | #import 16 | 17 | int main(int argc, char * argv[]) { 18 | @autoreleasepool { 19 | NSString *delegateClassName = @"AppDelegate"; 20 | return UIApplicationMain(argc, argv, nil, delegateClassName); 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/project.pbxproj: -------------------------------------------------------------------------------- 1 | // !$*UTF8*$! 2 | { 3 | archiveVersion = 1; 4 | classes = { 5 | }; 6 | objectVersion = 46; 7 | objects = { 8 | 9 | /* Begin PBXBuildFile section */ 10 | 590E7D881D02091F00DF5523 /* libprotobuf-lite.a in Frameworks */ = {isa = PBXBuildFile; fileRef = 590E7D861D02091F00DF5523 /* libprotobuf-lite.a */; }; 11 | 590E7D8A1D0209DD00DF5523 /* libprotobuf.a in Frameworks */ = {isa = PBXBuildFile; fileRef = 590E7D871D02091F00DF5523 /* libprotobuf.a */; }; 12 | 5993C7741D5D4EAF0048CE6A /* Accelerate.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5993C7731D5D4EAF0048CE6A /* Accelerate.framework */; }; 13 | 59A3D0011CF4E68100C4259F /* AppDelegate.mm in Sources */ = {isa = PBXBuildFile; fileRef = 59A3CFF21CF4E68100C4259F /* AppDelegate.mm */; }; 14 | 59A3D0031CF4E68100C4259F /* grace_hopper.jpg in Resources */ = {isa = PBXBuildFile; fileRef = 59A3CFF51CF4E68100C4259F /* grace_hopper.jpg */; }; 15 | 59A3D0051CF4E68100C4259F /* imagenet_comp_graph_label_strings.txt in Resources */ = {isa = PBXBuildFile; fileRef = 59A3CFF71CF4E68100C4259F /* imagenet_comp_graph_label_strings.txt */; }; 16 | 59A3D0071CF4E68100C4259F /* tensorflow_inception_graph.pb in Resources */ = {isa = PBXBuildFile; fileRef = 59A3CFF91CF4E68100C4259F /* tensorflow_inception_graph.pb */; }; 17 | 59A3D0081CF4E68100C4259F /* ios_image_load.mm in Sources */ = {isa = PBXBuildFile; fileRef = 59A3CFFB1CF4E68100C4259F /* ios_image_load.mm */; }; 18 | 59A3D0091CF4E68100C4259F /* main.mm in Sources */ = {isa = PBXBuildFile; fileRef = 59A3CFFC1CF4E68100C4259F /* main.mm */; }; 19 | 59A3D00B1CF4E68100C4259F /* RunModelViewController.mm in Sources */ = {isa = PBXBuildFile; fileRef = 59A3CFFF1CF4E68100C4259F /* RunModelViewController.mm */; }; 20 | 59A3D00C1CF4E68100C4259F /* RunModelViewController.xib in Resources */ = {isa = PBXBuildFile; fileRef = 59A3D0001CF4E68100C4259F /* RunModelViewController.xib */; }; 21 | 59A3D0141CF4E82500C4259F /* CoreGraphics.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 59A3D0131CF4E82500C4259F /* CoreGraphics.framework */; }; 22 | 59A3D0181CF4E86100C4259F /* UIKit.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 59A3D0171CF4E86100C4259F /* UIKit.framework */; }; 23 | 982D4D971EC2398900B505D9 /* emoji_inference.pb in Resources */ = {isa = PBXBuildFile; fileRef = 982D4D961EC2398500B505D9 /* emoji_inference.pb */; }; 24 | 982D4D991EC2495D00B505D9 /* emoji_frozen.pb in Resources */ = {isa = PBXBuildFile; fileRef = 982D4D981EC2495300B505D9 /* emoji_frozen.pb */; }; 25 | /* End PBXBuildFile section */ 26 | 27 | /* Begin PBXFileReference section */ 28 | 590E7D861D02091F00DF5523 /* libprotobuf-lite.a */ = {isa = PBXFileReference; lastKnownFileType = archive.ar; name = "libprotobuf-lite.a"; path = "../../makefile/gen/protobuf_ios/lib/libprotobuf-lite.a"; sourceTree = ""; }; 29 | 590E7D871D02091F00DF5523 /* libprotobuf.a */ = {isa = PBXFileReference; lastKnownFileType = archive.ar; name = libprotobuf.a; path = ../../makefile/gen/protobuf_ios/lib/libprotobuf.a; sourceTree = ""; }; 30 | 5911579B1CF4011C00C31E3A /* tf_ios_makefile_example.app */ = {isa = PBXFileReference; explicitFileType = wrapper.application; includeInIndex = 0; path = tf_ios_makefile_example.app; sourceTree = BUILT_PRODUCTS_DIR; }; 31 | 5993C7731D5D4EAF0048CE6A /* Accelerate.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = Accelerate.framework; path = System/Library/Frameworks/Accelerate.framework; sourceTree = SDKROOT; }; 32 | 59A3CFF11CF4E68100C4259F /* AppDelegate.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AppDelegate.h; sourceTree = ""; }; 33 | 59A3CFF21CF4E68100C4259F /* AppDelegate.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = AppDelegate.mm; sourceTree = ""; }; 34 | 59A3CFF41CF4E68100C4259F /* cropped_panda.jpg */ = {isa = PBXFileReference; lastKnownFileType = image.jpeg; path = cropped_panda.jpg; sourceTree = ""; }; 35 | 59A3CFF51CF4E68100C4259F /* grace_hopper.jpg */ = {isa = PBXFileReference; lastKnownFileType = image.jpeg; path = grace_hopper.jpg; sourceTree = ""; }; 36 | 59A3CFF61CF4E68100C4259F /* imagenet_2012_challenge_label_map_proto.pbtxt */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = imagenet_2012_challenge_label_map_proto.pbtxt; sourceTree = ""; }; 37 | 59A3CFF71CF4E68100C4259F /* imagenet_comp_graph_label_strings.txt */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = imagenet_comp_graph_label_strings.txt; sourceTree = ""; }; 38 | 59A3CFF81CF4E68100C4259F /* LICENSE */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = LICENSE; sourceTree = ""; }; 39 | 59A3CFF91CF4E68100C4259F /* tensorflow_inception_graph.pb */ = {isa = PBXFileReference; lastKnownFileType = file; path = tensorflow_inception_graph.pb; sourceTree = ""; }; 40 | 59A3CFFA1CF4E68100C4259F /* ios_image_load.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ios_image_load.h; sourceTree = ""; }; 41 | 59A3CFFB1CF4E68100C4259F /* ios_image_load.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = ios_image_load.mm; sourceTree = ""; }; 42 | 59A3CFFC1CF4E68100C4259F /* main.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = main.mm; sourceTree = ""; }; 43 | 59A3CFFD1CF4E68100C4259F /* RunModel-Info.plist */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.plist.xml; path = "RunModel-Info.plist"; sourceTree = ""; }; 44 | 59A3CFFE1CF4E68100C4259F /* RunModelViewController.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RunModelViewController.h; sourceTree = ""; }; 45 | 59A3CFFF1CF4E68100C4259F /* RunModelViewController.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = RunModelViewController.mm; sourceTree = ""; }; 46 | 59A3D0001CF4E68100C4259F /* RunModelViewController.xib */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = file.xib; path = RunModelViewController.xib; sourceTree = ""; }; 47 | 59A3D0131CF4E82500C4259F /* CoreGraphics.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = CoreGraphics.framework; path = System/Library/Frameworks/CoreGraphics.framework; sourceTree = SDKROOT; }; 48 | 59A3D0151CF4E83D00C4259F /* Foundation.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = Foundation.framework; path = System/Library/Frameworks/Foundation.framework; sourceTree = SDKROOT; }; 49 | 59A3D0171CF4E86100C4259F /* UIKit.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = UIKit.framework; path = System/Library/Frameworks/UIKit.framework; sourceTree = SDKROOT; }; 50 | 982D4D961EC2398500B505D9 /* emoji_inference.pb */ = {isa = PBXFileReference; lastKnownFileType = file; path = emoji_inference.pb; sourceTree = ""; }; 51 | 982D4D981EC2495300B505D9 /* emoji_frozen.pb */ = {isa = PBXFileReference; lastKnownFileType = file; path = emoji_frozen.pb; sourceTree = ""; }; 52 | /* End PBXFileReference section */ 53 | 54 | /* Begin PBXFrameworksBuildPhase section */ 55 | 591157981CF4011C00C31E3A /* Frameworks */ = { 56 | isa = PBXFrameworksBuildPhase; 57 | buildActionMask = 2147483647; 58 | files = ( 59 | 5993C7741D5D4EAF0048CE6A /* Accelerate.framework in Frameworks */, 60 | 590E7D8A1D0209DD00DF5523 /* libprotobuf.a in Frameworks */, 61 | 590E7D881D02091F00DF5523 /* libprotobuf-lite.a in Frameworks */, 62 | 59A3D0181CF4E86100C4259F /* UIKit.framework in Frameworks */, 63 | 59A3D0141CF4E82500C4259F /* CoreGraphics.framework in Frameworks */, 64 | ); 65 | runOnlyForDeploymentPostprocessing = 0; 66 | }; 67 | /* End PBXFrameworksBuildPhase section */ 68 | 69 | /* Begin PBXGroup section */ 70 | 591157921CF4011C00C31E3A = { 71 | isa = PBXGroup; 72 | children = ( 73 | 5993C7731D5D4EAF0048CE6A /* Accelerate.framework */, 74 | 590E7D861D02091F00DF5523 /* libprotobuf-lite.a */, 75 | 590E7D871D02091F00DF5523 /* libprotobuf.a */, 76 | 59A3D0171CF4E86100C4259F /* UIKit.framework */, 77 | 59A3D0151CF4E83D00C4259F /* Foundation.framework */, 78 | 59A3D0131CF4E82500C4259F /* CoreGraphics.framework */, 79 | 59A3CFF11CF4E68100C4259F /* AppDelegate.h */, 80 | 59A3CFF21CF4E68100C4259F /* AppDelegate.mm */, 81 | 59A3CFF31CF4E68100C4259F /* data */, 82 | 59A3CFFA1CF4E68100C4259F /* ios_image_load.h */, 83 | 59A3CFFB1CF4E68100C4259F /* ios_image_load.mm */, 84 | 59A3CFFC1CF4E68100C4259F /* main.mm */, 85 | 59A3CFFD1CF4E68100C4259F /* RunModel-Info.plist */, 86 | 59A3CFFE1CF4E68100C4259F /* RunModelViewController.h */, 87 | 59A3CFFF1CF4E68100C4259F /* RunModelViewController.mm */, 88 | 59A3D0001CF4E68100C4259F /* RunModelViewController.xib */, 89 | 5911579C1CF4011C00C31E3A /* Products */, 90 | ); 91 | sourceTree = ""; 92 | }; 93 | 5911579C1CF4011C00C31E3A /* Products */ = { 94 | isa = PBXGroup; 95 | children = ( 96 | 5911579B1CF4011C00C31E3A /* tf_ios_makefile_example.app */, 97 | ); 98 | name = Products; 99 | sourceTree = ""; 100 | }; 101 | 59A3CFF31CF4E68100C4259F /* data */ = { 102 | isa = PBXGroup; 103 | children = ( 104 | 982D4D981EC2495300B505D9 /* emoji_frozen.pb */, 105 | 982D4D961EC2398500B505D9 /* emoji_inference.pb */, 106 | 59A3CFF41CF4E68100C4259F /* cropped_panda.jpg */, 107 | 59A3CFF51CF4E68100C4259F /* grace_hopper.jpg */, 108 | 59A3CFF61CF4E68100C4259F /* imagenet_2012_challenge_label_map_proto.pbtxt */, 109 | 59A3CFF71CF4E68100C4259F /* imagenet_comp_graph_label_strings.txt */, 110 | 59A3CFF81CF4E68100C4259F /* LICENSE */, 111 | 59A3CFF91CF4E68100C4259F /* tensorflow_inception_graph.pb */, 112 | ); 113 | path = data; 114 | sourceTree = ""; 115 | }; 116 | /* End PBXGroup section */ 117 | 118 | /* Begin PBXNativeTarget section */ 119 | 5911579A1CF4011C00C31E3A /* tf_ios_makefile_example */ = { 120 | isa = PBXNativeTarget; 121 | buildConfigurationList = 591157B21CF4011D00C31E3A /* Build configuration list for PBXNativeTarget "tf_ios_makefile_example" */; 122 | buildPhases = ( 123 | 591157971CF4011C00C31E3A /* Sources */, 124 | 591157981CF4011C00C31E3A /* Frameworks */, 125 | 591157991CF4011C00C31E3A /* Resources */, 126 | ); 127 | buildRules = ( 128 | ); 129 | dependencies = ( 130 | ); 131 | name = tf_ios_makefile_example; 132 | productName = tf_ios_makefile_example; 133 | productReference = 5911579B1CF4011C00C31E3A /* tf_ios_makefile_example.app */; 134 | productType = "com.apple.product-type.application"; 135 | }; 136 | /* End PBXNativeTarget section */ 137 | 138 | /* Begin PBXProject section */ 139 | 591157931CF4011C00C31E3A /* Project object */ = { 140 | isa = PBXProject; 141 | attributes = { 142 | LastUpgradeCheck = 0830; 143 | ORGANIZATIONNAME = Google; 144 | TargetAttributes = { 145 | 5911579A1CF4011C00C31E3A = { 146 | CreatedOnToolsVersion = 7.2; 147 | DevelopmentTeam = H26V4ZD73J; 148 | }; 149 | }; 150 | }; 151 | buildConfigurationList = 591157961CF4011C00C31E3A /* Build configuration list for PBXProject "tf_ios_makefile_example" */; 152 | compatibilityVersion = "Xcode 3.2"; 153 | developmentRegion = English; 154 | hasScannedForEncodings = 0; 155 | knownRegions = ( 156 | en, 157 | Base, 158 | ); 159 | mainGroup = 591157921CF4011C00C31E3A; 160 | productRefGroup = 5911579C1CF4011C00C31E3A /* Products */; 161 | projectDirPath = ""; 162 | projectRoot = ""; 163 | targets = ( 164 | 5911579A1CF4011C00C31E3A /* tf_ios_makefile_example */, 165 | ); 166 | }; 167 | /* End PBXProject section */ 168 | 169 | /* Begin PBXResourcesBuildPhase section */ 170 | 591157991CF4011C00C31E3A /* Resources */ = { 171 | isa = PBXResourcesBuildPhase; 172 | buildActionMask = 2147483647; 173 | files = ( 174 | 59A3D00C1CF4E68100C4259F /* RunModelViewController.xib in Resources */, 175 | 59A3D0051CF4E68100C4259F /* imagenet_comp_graph_label_strings.txt in Resources */, 176 | 59A3D0071CF4E68100C4259F /* tensorflow_inception_graph.pb in Resources */, 177 | 59A3D0031CF4E68100C4259F /* grace_hopper.jpg in Resources */, 178 | 982D4D991EC2495D00B505D9 /* emoji_frozen.pb in Resources */, 179 | 982D4D971EC2398900B505D9 /* emoji_inference.pb in Resources */, 180 | ); 181 | runOnlyForDeploymentPostprocessing = 0; 182 | }; 183 | /* End PBXResourcesBuildPhase section */ 184 | 185 | /* Begin PBXSourcesBuildPhase section */ 186 | 591157971CF4011C00C31E3A /* Sources */ = { 187 | isa = PBXSourcesBuildPhase; 188 | buildActionMask = 2147483647; 189 | files = ( 190 | 59A3D0091CF4E68100C4259F /* main.mm in Sources */, 191 | 59A3D0011CF4E68100C4259F /* AppDelegate.mm in Sources */, 192 | 59A3D00B1CF4E68100C4259F /* RunModelViewController.mm in Sources */, 193 | 59A3D0081CF4E68100C4259F /* ios_image_load.mm in Sources */, 194 | ); 195 | runOnlyForDeploymentPostprocessing = 0; 196 | }; 197 | /* End PBXSourcesBuildPhase section */ 198 | 199 | /* Begin XCBuildConfiguration section */ 200 | 591157B01CF4011D00C31E3A /* Debug */ = { 201 | isa = XCBuildConfiguration; 202 | buildSettings = { 203 | ALWAYS_SEARCH_USER_PATHS = NO; 204 | CLANG_CXX_LANGUAGE_STANDARD = "gnu++0x"; 205 | CLANG_CXX_LIBRARY = "libc++"; 206 | CLANG_ENABLE_MODULES = YES; 207 | CLANG_ENABLE_OBJC_ARC = YES; 208 | CLANG_WARN_BOOL_CONVERSION = YES; 209 | CLANG_WARN_CONSTANT_CONVERSION = YES; 210 | CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; 211 | CLANG_WARN_EMPTY_BODY = YES; 212 | CLANG_WARN_ENUM_CONVERSION = YES; 213 | CLANG_WARN_INFINITE_RECURSION = YES; 214 | CLANG_WARN_INT_CONVERSION = YES; 215 | CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; 216 | CLANG_WARN_SUSPICIOUS_MOVE = YES; 217 | CLANG_WARN_UNREACHABLE_CODE = YES; 218 | CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; 219 | "CODE_SIGN_IDENTITY[sdk=iphoneos*]" = "iPhone Developer"; 220 | COPY_PHASE_STRIP = NO; 221 | DEBUG_INFORMATION_FORMAT = dwarf; 222 | ENABLE_STRICT_OBJC_MSGSEND = YES; 223 | ENABLE_TESTABILITY = YES; 224 | GCC_C_LANGUAGE_STANDARD = gnu99; 225 | GCC_DYNAMIC_NO_PIC = NO; 226 | GCC_NO_COMMON_BLOCKS = YES; 227 | GCC_OPTIMIZATION_LEVEL = 0; 228 | GCC_PREPROCESSOR_DEFINITIONS = ( 229 | "DEBUG=1", 230 | "$(inherited)", 231 | ); 232 | GCC_WARN_64_TO_32_BIT_CONVERSION = YES; 233 | GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; 234 | GCC_WARN_UNDECLARED_SELECTOR = YES; 235 | GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; 236 | GCC_WARN_UNUSED_FUNCTION = YES; 237 | GCC_WARN_UNUSED_VARIABLE = YES; 238 | IPHONEOS_DEPLOYMENT_TARGET = 9.2; 239 | MTL_ENABLE_DEBUG_INFO = YES; 240 | ONLY_ACTIVE_ARCH = YES; 241 | SDKROOT = iphoneos; 242 | TARGETED_DEVICE_FAMILY = "1,2"; 243 | }; 244 | name = Debug; 245 | }; 246 | 591157B11CF4011D00C31E3A /* Release */ = { 247 | isa = XCBuildConfiguration; 248 | buildSettings = { 249 | ALWAYS_SEARCH_USER_PATHS = NO; 250 | CLANG_CXX_LANGUAGE_STANDARD = "gnu++0x"; 251 | CLANG_CXX_LIBRARY = "libc++"; 252 | CLANG_ENABLE_MODULES = YES; 253 | CLANG_ENABLE_OBJC_ARC = YES; 254 | CLANG_WARN_BOOL_CONVERSION = YES; 255 | CLANG_WARN_CONSTANT_CONVERSION = YES; 256 | CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; 257 | CLANG_WARN_EMPTY_BODY = YES; 258 | CLANG_WARN_ENUM_CONVERSION = YES; 259 | CLANG_WARN_INFINITE_RECURSION = YES; 260 | CLANG_WARN_INT_CONVERSION = YES; 261 | CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; 262 | CLANG_WARN_SUSPICIOUS_MOVE = YES; 263 | CLANG_WARN_UNREACHABLE_CODE = YES; 264 | CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; 265 | "CODE_SIGN_IDENTITY[sdk=iphoneos*]" = "iPhone Developer"; 266 | COPY_PHASE_STRIP = NO; 267 | DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym"; 268 | ENABLE_NS_ASSERTIONS = NO; 269 | ENABLE_STRICT_OBJC_MSGSEND = YES; 270 | GCC_C_LANGUAGE_STANDARD = gnu99; 271 | GCC_NO_COMMON_BLOCKS = YES; 272 | GCC_WARN_64_TO_32_BIT_CONVERSION = YES; 273 | GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; 274 | GCC_WARN_UNDECLARED_SELECTOR = YES; 275 | GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; 276 | GCC_WARN_UNUSED_FUNCTION = YES; 277 | GCC_WARN_UNUSED_VARIABLE = YES; 278 | IPHONEOS_DEPLOYMENT_TARGET = 9.2; 279 | MTL_ENABLE_DEBUG_INFO = NO; 280 | SDKROOT = iphoneos; 281 | TARGETED_DEVICE_FAMILY = "1,2"; 282 | VALIDATE_PRODUCT = YES; 283 | }; 284 | name = Release; 285 | }; 286 | 591157B31CF4011D00C31E3A /* Debug */ = { 287 | isa = XCBuildConfiguration; 288 | buildSettings = { 289 | CLANG_DEBUG_INFORMATION_LEVEL = default; 290 | CODE_SIGN_IDENTITY = "iPhone Developer"; 291 | DEVELOPMENT_TEAM = H26V4ZD73J; 292 | ENABLE_BITCODE = NO; 293 | GCC_ENABLE_CPP_EXCEPTIONS = YES; 294 | GCC_ENABLE_CPP_RTTI = YES; 295 | HEADER_SEARCH_PATHS = ( 296 | "$(SRCROOT)/../../../..", 297 | "$(SRCROOT)/../../makefile/downloads/protobuf/src/", 298 | "$(SRCROOT)/../../makefile/downloads", 299 | "$(SRCROOT)/../../makefile/gen/proto", 300 | "$(SRCROOT)/../../makefile/downloads/eigen", 301 | ); 302 | INFOPLIST_FILE = "$(SRCROOT)/RunModel-Info.plist"; 303 | IPHONEOS_DEPLOYMENT_TARGET = 9.2; 304 | LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/Frameworks"; 305 | LIBRARY_SEARCH_PATHS = ( 306 | "$(SRCROOT)/../../makefile/gen/protobuf_ios/lib", 307 | "$(SRCROOT)/../../makefile/gen/lib", 308 | ); 309 | OTHER_CPLUSPLUSFLAGS = "$(OTHER_CFLAGS)"; 310 | OTHER_LDFLAGS = ( 311 | "-force_load", 312 | "$(SRCROOT)/../../makefile/gen/lib/libtensorflow-core.a", 313 | "-Xlinker", 314 | "-S", 315 | "-Xlinker", 316 | "-x", 317 | "-Xlinker", 318 | "-dead_strip", 319 | ); 320 | PRODUCT_BUNDLE_IDENTIFIER = "com.google.TF-Test"; 321 | PRODUCT_NAME = "$(TARGET_NAME)"; 322 | SEPARATE_STRIP = NO; 323 | }; 324 | name = Debug; 325 | }; 326 | 591157B41CF4011D00C31E3A /* Release */ = { 327 | isa = XCBuildConfiguration; 328 | buildSettings = { 329 | CLANG_DEBUG_INFORMATION_LEVEL = default; 330 | CODE_SIGN_IDENTITY = "iPhone Developer"; 331 | DEVELOPMENT_TEAM = H26V4ZD73J; 332 | ENABLE_BITCODE = NO; 333 | GCC_ENABLE_CPP_EXCEPTIONS = YES; 334 | GCC_ENABLE_CPP_RTTI = YES; 335 | HEADER_SEARCH_PATHS = ( 336 | "$(SRCROOT)/../../../..", 337 | "$(SRCROOT)/../../makefile/downloads/protobuf/src/", 338 | "$(SRCROOT)/../../makefile/downloads", 339 | "$(SRCROOT)/../../makefile/gen/proto", 340 | "$(SRCROOT)/../../makefile/downloads/eigen", 341 | ); 342 | INFOPLIST_FILE = "$(SRCROOT)/RunModel-Info.plist"; 343 | IPHONEOS_DEPLOYMENT_TARGET = 9.2; 344 | LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/Frameworks"; 345 | LIBRARY_SEARCH_PATHS = ( 346 | "$(SRCROOT)/../../makefile/gen/protobuf_ios/lib", 347 | "$(SRCROOT)/../../makefile/gen/lib", 348 | ); 349 | ONLY_ACTIVE_ARCH = YES; 350 | OTHER_CPLUSPLUSFLAGS = "$(OTHER_CFLAGS)"; 351 | OTHER_LDFLAGS = ( 352 | "-force_load", 353 | "$(SRCROOT)/../../makefile/gen/lib/libtensorflow-core.a", 354 | "-Xlinker", 355 | "-S", 356 | "-Xlinker", 357 | "-x", 358 | "-Xlinker", 359 | "-dead_strip", 360 | ); 361 | PRODUCT_BUNDLE_IDENTIFIER = "com.google.TF-Test"; 362 | PRODUCT_NAME = "$(TARGET_NAME)"; 363 | SEPARATE_STRIP = NO; 364 | }; 365 | name = Release; 366 | }; 367 | /* End XCBuildConfiguration section */ 368 | 369 | /* Begin XCConfigurationList section */ 370 | 591157961CF4011C00C31E3A /* Build configuration list for PBXProject "tf_ios_makefile_example" */ = { 371 | isa = XCConfigurationList; 372 | buildConfigurations = ( 373 | 591157B01CF4011D00C31E3A /* Debug */, 374 | 591157B11CF4011D00C31E3A /* Release */, 375 | ); 376 | defaultConfigurationIsVisible = 0; 377 | defaultConfigurationName = Release; 378 | }; 379 | 591157B21CF4011D00C31E3A /* Build configuration list for PBXNativeTarget "tf_ios_makefile_example" */ = { 380 | isa = XCConfigurationList; 381 | buildConfigurations = ( 382 | 591157B31CF4011D00C31E3A /* Debug */, 383 | 591157B41CF4011D00C31E3A /* Release */, 384 | ); 385 | defaultConfigurationIsVisible = 0; 386 | defaultConfigurationName = Release; 387 | }; 388 | /* End XCConfigurationList section */ 389 | }; 390 | rootObject = 591157931CF4011C00C31E3A /* Project object */; 391 | } 392 | -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/project.xcworkspace/contents.xcworkspacedata: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/project.xcworkspace/xcuserdata/h4x.xcuserdatad/UserInterfaceState.xcuserstate: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h4x3rotab/emoji-tf-ios/e84a0ec65f9020c720b685ba47ce9c7dae3d21d9/emoji_demo/tf_ios_makefile_example.xcodeproj/project.xcworkspace/xcuserdata/h4x.xcuserdatad/UserInterfaceState.xcuserstate -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/xcuserdata/h4x.xcuserdatad/xcdebugger/Breakpoints_v2.xcbkptlist: -------------------------------------------------------------------------------- 1 | 2 | 5 | 6 | -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/xcuserdata/h4x.xcuserdatad/xcschemes/tf_ios_makefile_example.xcscheme: -------------------------------------------------------------------------------- 1 | 2 | 5 | 8 | 9 | 15 | 21 | 22 | 23 | 24 | 25 | 30 | 31 | 32 | 33 | 39 | 40 | 41 | 42 | 43 | 44 | 54 | 56 | 62 | 63 | 64 | 65 | 66 | 67 | 73 | 75 | 81 | 82 | 83 | 84 | 86 | 87 | 90 | 91 | 92 | -------------------------------------------------------------------------------- /emoji_demo/tf_ios_makefile_example.xcodeproj/xcuserdata/h4x.xcuserdatad/xcschemes/xcschememanagement.plist: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | SchemeUserState 6 | 7 | tf_ios_makefile_example.xcscheme 8 | 9 | orderHint 10 | 0 11 | 12 | 13 | SuppressBuildableAutocreation 14 | 15 | 5911579A1CF4011C00C31E3A 16 | 17 | primary 18 | 19 | 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /export_tf_model.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true, 8 | "deletable": true, 9 | "editable": true 10 | }, 11 | "outputs": [], 12 | "source": [ 13 | "from keras.models import Model, load_model, model_from_config\n", 14 | "from keras import backend as K\n", 15 | "from tensorflow.contrib.session_bundle import exporter\n", 16 | "from tensorflow.python import saved_model\n", 17 | "import tensorflow as tf" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": { 24 | "collapsed": true, 25 | "deletable": true, 26 | "editable": true 27 | }, 28 | "outputs": [], 29 | "source": [ 30 | "sess = tf.Session()\n", 31 | "K.set_session(sess)\n", 32 | "K.set_learning_phase(0) # all new operations will be in test mode from now on" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": { 39 | "collapsed": true, 40 | "deletable": true, 41 | "editable": true 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "orig_model = load_model('p5-40-test.hdf5')\n", 46 | "weights = orig_model.get_weights()\n", 47 | "model = model_from_config({\n", 48 | " 'class_name': 'Model',\n", 49 | " 'config': orig_model.get_config(),\n", 50 | "})\n", 51 | "model.set_weights(weights)" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": { 58 | "collapsed": true, 59 | "deletable": true, 60 | "editable": true 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "tf.train.write_graph(sess.graph_def, 'export/p5-40-test-serving', \"graph-serving.pb\", True)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": { 71 | "collapsed": true, 72 | "deletable": true, 73 | "editable": true, 74 | "scrolled": false 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "saver = tf.train.Saver()" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "collapsed": true, 86 | "deletable": true, 87 | "editable": true 88 | }, 89 | "outputs": [], 90 | "source": [ 91 | "saver.save(sess, 'export/p5-40-test-serving/model-ckpt')" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": { 98 | "collapsed": true, 99 | "deletable": true, 100 | "editable": true 101 | }, 102 | "outputs": [], 103 | "source": [] 104 | } 105 | ], 106 | "metadata": { 107 | "kernelspec": { 108 | "display_name": "Python 3", 109 | "language": "python", 110 | "name": "python3" 111 | } 112 | }, 113 | "nbformat": 4, 114 | "nbformat_minor": 2 115 | } 116 | -------------------------------------------------------------------------------- /extract.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | import json 4 | import re 5 | import sys 6 | 7 | # This is not the prefect way to match emojis. But just use it as a demo. 8 | re_emoji = re.compile(u'[' 9 | u'\U0001F300-\U0001F64F' 10 | u'\U0001F680-\U0001F6FF' 11 | u'\u2600-\u26FF\u2700-\u27BF]', 12 | re.UNICODE) 13 | 14 | c_bad_json = 0 15 | c_bad_lang = 0 16 | c_no_text = 0 17 | 18 | for line in sys.stdin: 19 | try: 20 | t = json.loads(line) 21 | except: 22 | c_bad_json += 1 23 | continue 24 | if 'lang' not in t or t['lang'] != 'en': 25 | c_bad_lang += 1 26 | continue 27 | if 'text' not in t: 28 | c_no_text += 1 29 | continue 30 | text = t['text'] 31 | if re_emoji.search(text): 32 | print(line.strip()) 33 | -------------------------------------------------------------------------------- /extract_all.sh: -------------------------------------------------------------------------------- 1 | INPUT=/Volumes/Archive/download/twitter201701/01 2 | 3 | find "${INPUT}" -iname '*.bz2' -print0 | while IFS= read -r -d $'\0' line; do 4 | echo "Processing $line" 5 | bzip2 -dc "$line" | python3 extract.py >> extracted.list 6 | done 7 | -------------------------------------------------------------------------------- /p5-40-test.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h4x3rotab/emoji-tf-ios/e84a0ec65f9020c720b685ba47ce9c7dae3d21d9/p5-40-test.hdf5 -------------------------------------------------------------------------------- /replayer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from keras.models import Model, load_model\n", 10 | "from keras.preprocessing import sequence\n", 11 | "import keras\n", 12 | "import numpy as np\n", 13 | "import pickle\n", 14 | "import random\n", 15 | "import tensorflow as tf" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": null, 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "model = load_model('p5-40-test.hdf5')" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "with open('data/plain_dataset_meta.pickle', 'rb') as fin:\n", 38 | " dataset_meta = pickle.load(fin)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "num_alphabet = 36 # (3+33)\n", 50 | "num_cat = 99 # (1+98)\n", 51 | "\n", 52 | "T_PAD = 0\n", 53 | "T_START = 1\n", 54 | "T_OOV = 2\n", 55 | "def convert_to_input(sent):\n", 56 | " aidx = dataset_meta['alphabet_idx']\n", 57 | " sent = sent.lower()\n", 58 | " f = [T_START]\n", 59 | " for ch in sent:\n", 60 | " if ch in aidx:\n", 61 | " f.append(aidx[ch])\n", 62 | " else:\n", 63 | " pass # skip OOV\n", 64 | " return f" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": { 71 | "collapsed": true 72 | }, 73 | "outputs": [], 74 | "source": [ 75 | "def top_n_predicate(pred, n=5):\n", 76 | " emojis = dataset_meta['emoji']\n", 77 | " preds = enumerate(list(pred.reshape(pred.shape[1],))[:-1])\n", 78 | " preds = sorted(preds, key=lambda x: -x[1])[:n]\n", 79 | " if False:\n", 80 | " print(preds)\n", 81 | " return [emojis[idx] for idx, pred in preds]\n", 82 | "\n", 83 | "def run(sent):\n", 84 | " input_sample = convert_to_input(sent)\n", 85 | " input_x = sequence.pad_sequences([input_sample], maxlen=120)\n", 86 | " val = model.predict(input_x)\n", 87 | " return top_n_predicate(val)" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "run('I\\'ll continue on this path for 2017.')" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "run('thank you sis')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "run('Don\\'t forget to wrap it up, twice')" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "run('I wanna go home and go to sleep')" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "run('happy new year! God Bless')" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "run('HAPPY NEW YEAR here\\'s to many more amazing memories')" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "run('day 1 of 365 thank you God for allowing me to see this day')" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "run('The art of knowing is knowing to \"IGNORE\". Good morning')" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "run('Nope. I Know He Likes Telling Stories')" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "run('but they won\\'t listen')" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": null, 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "run('Bitcoin has reached a new all-time high of $1,600!')" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "seq = convert_to_input('day 1 of 365 thank you God for allowing me to see this day')\n", 196 | "sequence.pad_sequences([seq], maxlen=120)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": { 203 | "collapsed": true 204 | }, 205 | "outputs": [], 206 | "source": [] 207 | } 208 | ], 209 | "metadata": { 210 | "kernelspec": { 211 | "display_name": "Python 3", 212 | "language": "python", 213 | "name": "python3" 214 | }, 215 | "language_info": { 216 | "codemirror_mode": { 217 | "name": "ipython", 218 | "version": 3 219 | }, 220 | "file_extension": ".py", 221 | "mimetype": "text/x-python", 222 | "name": "python", 223 | "nbconvert_exporter": "python", 224 | "pygments_lexer": "ipython3", 225 | "version": "3.6.1" 226 | } 227 | }, 228 | "nbformat": 4, 229 | "nbformat_minor": 2 230 | } 231 | -------------------------------------------------------------------------------- /stats_top.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import re 3 | import json 4 | from collections import defaultdict 5 | 6 | # This is not the prefect way to match emojis. But just use it as a demo. 7 | re_emoji = re.compile(u'[' 8 | u'\U0001F300-\U0001F64F' 9 | u'\U0001F680-\U0001F6FF' 10 | u'\u2600-\u26FF\u2700-\u27BF]', 11 | re.UNICODE) 12 | 13 | c = defaultdict(int) 14 | readed = 0 15 | last_reported = 0 16 | 17 | try: 18 | with open('data/extracted.list') as fin: 19 | for line in fin: 20 | readed += len(line) 21 | if not line.strip(): 22 | continue 23 | t = json.loads(line) 24 | for emoji in re_emoji.findall(t['text']): 25 | c[emoji] += 1 26 | 27 | readed_mb = readed // (100 * 1024 * 1024) 28 | if readed_mb > last_reported: 29 | last_reported = readed_mb 30 | print('Processed:', readed_mb * 100) 31 | finally: 32 | out = '' 33 | for k, v in sorted(c.items(), key=lambda x: -x[1]): 34 | out += '{}: {}\n'.format(k, v) 35 | print(out) 36 | with open('data/stat.txt', 'w') as fout: 37 | fout.write(out) 38 | -------------------------------------------------------------------------------- /tokenize_dataset.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 4, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import pickle\n", 12 | "from collections import defaultdict" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 5, 18 | "metadata": { 19 | "collapsed": true 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "with open('data/dataset.pickle', 'rb') as fin:\n", 24 | " dataset = pickle.load(fin)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 6, 30 | "metadata": { 31 | "collapsed": true 32 | }, 33 | "outputs": [], 34 | "source": [ 35 | "chc = defaultdict(int)\n", 36 | "for emoji, items in dataset.items():\n", 37 | " for sent in items:\n", 38 | " for ch in sent:\n", 39 | " chc[ch] += 1" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 7, 45 | "metadata": { 46 | "collapsed": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "CH_PADDING = 0\n", 51 | "CH_START = 1\n", 52 | "CH_OOV = 2\n", 53 | "\n", 54 | "ch_stats = sorted(chc.items(), key=lambda x: -x[1])\n", 55 | "alphabet = [ch for ch, _ in ch_stats[:33]]\n", 56 | "idx_alphabet = {ch: idx+3 for idx, ch in enumerate(alphabet)}\n", 57 | "idx_emoji = {emoji: idx for idx, emoji in enumerate(dataset.keys())}\n" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 5, 63 | "metadata": { 64 | "scrolled": false 65 | }, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "defaultdict(, {'good': 3974488, 'skiped-oov': 84422, 'skiped-invalid-sent': 50})\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "c = defaultdict(int)\n", 77 | "plain_x = []\n", 78 | "plain_y = []\n", 79 | "for emoji, items in dataset.items():\n", 80 | " for sent in items:\n", 81 | " seq = [CH_START]\n", 82 | " for ch in sent:\n", 83 | " if ch in idx_alphabet:\n", 84 | " seq.append(idx_alphabet[ch])\n", 85 | " else:\n", 86 | " c['skiped-oov'] += 1\n", 87 | " if len(seq) == 1:\n", 88 | " c['skiped-invalid-sent'] += 1\n", 89 | " continue\n", 90 | " plain_x.append(seq)\n", 91 | " plain_y.append(idx_emoji[emoji])\n", 92 | " c['good'] += 1\n", 93 | "print(c)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 15, 99 | "metadata": { 100 | "collapsed": true 101 | }, 102 | "outputs": [], 103 | "source": [ 104 | "import random\n", 105 | "SEED = 0\n", 106 | "random.seed(SEED)\n", 107 | "random.shuffle(plain_x)\n", 108 | "random.seed(SEED)\n", 109 | "random.shuffle(plain_y)" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 10, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/plain": [ 120 | "{0: '🎉',\n", 121 | " 1: '🎈',\n", 122 | " 2: '🏽',\n", 123 | " 3: '🙄',\n", 124 | " 4: '👑',\n", 125 | " 5: '✨',\n", 126 | " 6: '💞',\n", 127 | " 7: '💕',\n", 128 | " 8: '❤',\n", 129 | " 9: '😏',\n", 130 | " 10: '🔥',\n", 131 | " 11: '😎',\n", 132 | " 12: '💀',\n", 133 | " 13: '😂',\n", 134 | " 14: '😍',\n", 135 | " 15: '😊',\n", 136 | " 16: '😈',\n", 137 | " 17: '♥',\n", 138 | " 18: '💔',\n", 139 | " 19: '😅',\n", 140 | " 20: '🌟',\n", 141 | " 21: '😜',\n", 142 | " 22: '😭',\n", 143 | " 23: '💗',\n", 144 | " 24: '😋',\n", 145 | " 25: '🌹',\n", 146 | " 26: '😩',\n", 147 | " 27: '💦',\n", 148 | " 28: '♂',\n", 149 | " 29: '🙏',\n", 150 | " 30: '☺',\n", 151 | " 31: '💯',\n", 152 | " 32: '😆',\n", 153 | " 33: '➡',\n", 154 | " 34: '🙌',\n", 155 | " 35: '💜',\n", 156 | " 36: '✔',\n", 157 | " 37: '💓',\n", 158 | " 38: '💙',\n", 159 | " 39: '😀',\n", 160 | " 40: '👉',\n", 161 | " 41: '😬',\n", 162 | " 42: '👌',\n", 163 | " 43: '😘',\n", 164 | " 44: '♡',\n", 165 | " 45: '🙃',\n", 166 | " 46: '😁',\n", 167 | " 47: '🙂',\n", 168 | " 48: '👀',\n", 169 | " 49: '💃',\n", 170 | " 50: '💛',\n", 171 | " 51: '👏',\n", 172 | " 52: '👍',\n", 173 | " 53: '😛',\n", 174 | " 54: '💪',\n", 175 | " 55: '💋',\n", 176 | " 56: '😻',\n", 177 | " 57: '😉',\n", 178 | " 58: '😄',\n", 179 | " 59: '😴',\n", 180 | " 60: '💥',\n", 181 | " 61: '💖',\n", 182 | " 62: '😤',\n", 183 | " 63: '🚨',\n", 184 | " 64: '⚡',\n", 185 | " 65: '😳',\n", 186 | " 66: '🎶',\n", 187 | " 67: '🗣',\n", 188 | " 68: '👅',\n", 189 | " 69: '😫',\n", 190 | " 70: '✌',\n", 191 | " 71: '💚',\n", 192 | " 72: '🙈',\n", 193 | " 73: '😇',\n", 194 | " 74: '😒',\n", 195 | " 75: '😌',\n", 196 | " 76: '❗',\n", 197 | " 77: '😢',\n", 198 | " 78: '😕',\n", 199 | " 79: '👊',\n", 200 | " 80: '🌙',\n", 201 | " 81: '👇',\n", 202 | " 82: '😔',\n", 203 | " 83: '❄',\n", 204 | " 84: '💘',\n", 205 | " 85: '✊',\n", 206 | " 86: '💫',\n", 207 | " 87: '😡',\n", 208 | " 88: '♀',\n", 209 | " 89: '🏆',\n", 210 | " 90: '🌸',\n", 211 | " 91: '★',\n", 212 | " 92: '😱',\n", 213 | " 93: '📷',\n", 214 | " 94: '💰',\n", 215 | " 95: '⚽',\n", 216 | " 96: '🐐',\n", 217 | " 97: '✅'}" 218 | ] 219 | }, 220 | "execution_count": 10, 221 | "metadata": {}, 222 | "output_type": "execute_result" 223 | } 224 | ], 225 | "source": [ 226 | "temp = {}\n", 227 | "num = 0\n", 228 | "for each in idx_emoji:\n", 229 | " temp[num] = each\n", 230 | " num += 1\n", 231 | "temp" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 3, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "ename": "NameError", 241 | "evalue": "name 'pickle' is not defined", 242 | "output_type": "error", 243 | "traceback": [ 244 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 245 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 246 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'data/plain_dataset.pickle'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'wb'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mfout\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mpickle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdump\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mplain_x\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mplain_y\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 247 | "\u001b[0;31mNameError\u001b[0m: name 'pickle' is not defined" 248 | ] 249 | } 250 | ], 251 | "source": [ 252 | "with open('data/plain_dataset.pickle', 'wb') as fout:\n", 253 | " pickle.dump((plain_x, plain_y), fout)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 17, 259 | "metadata": { 260 | "collapsed": true 261 | }, 262 | "outputs": [], 263 | "source": [ 264 | "with open('data/plain_dataset_meta.pickle', 'wb') as fout:\n", 265 | " pickle.dump({\n", 266 | " 'alphabet': ['[PAD]', '[START]', '[OOV]'] + alphabet,\n", 267 | " 'emoji': list(dataset.keys()),\n", 268 | " 'alphabet_idx': idx_alphabet,\n", 269 | " 'emoji_idx': idx_emoji,\n", 270 | " }, fout)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 10, 276 | "metadata": { 277 | "collapsed": true 278 | }, 279 | "outputs": [], 280 | "source": [ 281 | "import json\n", 282 | "with open('emoji_tmp.json','w') as fin:\n", 283 | " fin.write(json.dumps(idx_emoji))" 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.6.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 2 308 | } 309 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from keras.callbacks import TensorBoard 4 | from keras.preprocessing import sequence 5 | from keras.models import Sequential, Model 6 | from keras.layers import Dense, Embedding, Lambda, Input, Dropout 7 | from keras.layers import LSTM, Conv1D, MaxPooling1D, GlobalAveragePooling1D 8 | from keras.layers.wrappers import Bidirectional 9 | from keras.datasets import imdb 10 | import keras 11 | import numpy as np 12 | import pickle 13 | import random 14 | 15 | 16 | print('Opening..') 17 | with open('data/plain_dataset.pickle', 'rb') as fin: 18 | X, Y = pickle.load(fin) 19 | 20 | 21 | # Shuffle 22 | NEED_SHUFFLE = False 23 | if NEED_SHUFFLE: 24 | print('Shuffling..') 25 | rand_indices = list(range(len(X))) 26 | random.shuffle(rand_indices) 27 | rand_X = [] 28 | rand_Y = [] 29 | for i in range(len(X)): 30 | t = rand_indices[i] 31 | rand_X.append(X[t]) 32 | rand_Y.append(Y[t]) 33 | X, Y = rand_X, rand_Y 34 | 35 | 36 | print('Preparing..') 37 | num_alphabet = 36 # (3+33) 38 | num_cat = 99 # (1+98) 39 | 40 | SAMPLES = 1000000 # 1000000 41 | TEST_SAMPLES = 10000 42 | num_train = SAMPLES 43 | num_test = TEST_SAMPLES 44 | 45 | train_X = X[:num_train] 46 | train_Y = Y[:num_train] 47 | test_X = X[num_train:(num_train + num_test)] 48 | test_Y = Y[num_train:(num_train + num_test)] 49 | 50 | train_Y = keras.utils.to_categorical(train_Y, num_cat) 51 | test_Y = keras.utils.to_categorical(test_Y, num_cat) 52 | 53 | 54 | train_X = sequence.pad_sequences(train_X, maxlen=MAXLEN) 55 | test_X = sequence.pad_sequences(test_X, maxlen=MAXLEN) 56 | print('x train shape:', train_X.shape, 'y:', train_Y.shape) 57 | print('x test shape:', test_X.shape, 'y:', test_Y.shape) 58 | 59 | def binarize(x, sz=num_alphabet): 60 | from keras.backend import tf 61 | return tf.to_float(tf.one_hot(x, sz, on_value=1, off_value=0, axis=-1)) 62 | def binarize_outshape(in_shape): 63 | return in_shape[0], in_shape[1], 36 # num_alphabet 64 | 65 | 66 | print('Building model...') 67 | 68 | MAXLEN = 120 69 | in_sentence = Input(shape=(MAXLEN,), dtype='int32') 70 | embedding = Lambda(binarize, output_shape=binarize_outshape)(in_sentence) # (length * chars, batch) 71 | # embedding = Embedding(num_alphabet, num_alphabet)(in_sentence) 72 | filter_length = [3, 3, 1] 73 | nb_filter = [196, 196, 256] 74 | pool_length = 2 75 | for i in range(len(nb_filter)): 76 | embedding = Conv1D(filters=nb_filter[i], 77 | kernel_size=filter_length[i], 78 | padding='valid', 79 | activation='relu', 80 | kernel_initializer='glorot_normal', 81 | strides=1)(embedding) 82 | embedding = MaxPooling1D(pool_size=pool_length)(embedding) 83 | hidden = Bidirectional(LSTM( 84 | 128, dropout=0.2, recurrent_dropout=0.2))(embedding) 85 | hidden = Dense(128, activation='relu')(hidden) 86 | hidden = Dropout(0.2)(hidden) 87 | output = Dense(num_cat, activation='softmax')(hidden) 88 | 89 | model = Model(inputs=in_sentence, outputs=output) 90 | model.compile(loss='categorical_crossentropy', 91 | optimizer='adam', 92 | metrics=['accuracy', 'top_k_categorical_accuracy']) 93 | 94 | 95 | print('Training...') 96 | BATCH_SIZE = 128 97 | cbTensorBoard = TensorBoard(log_dir='./Graph', histogram_freq=1, 98 | write_graph=True, write_images=False) 99 | model.fit(train_X, train_Y, 100 | batch_size=BATCH_SIZE, 101 | epochs=11, 102 | validation_data=(test_X, test_Y), 103 | callbacks=[cbTensorBoard]) 104 | 105 | 106 | model.save('p5-40-test.hdf5') 107 | 108 | --------------------------------------------------------------------------------