├── .gitignore ├── LICENSE ├── README.md ├── freeze_graph.py ├── inference.py ├── postprocess.py ├── saved_params ├── maya_close_face.txt └── wav_mean_std.csv ├── test_audio ├── coffee_xxxx.wav └── visemenet_intro.wav └── visemenet_frozen.pb /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # visemenet-inference 2 | - Inference Demo of ["VisemeNet-tensorflow"](https://github.com/yzhou359/VisemeNet_tensorflow) 3 | * VisemeNet is an audio-driven animator centric speech animation driving a JALI or standard FACS-based face-rigging from input audio. 4 | * The original repo is outdated and difficult to setup the environment for testing the pretrained model. This code is to provide a super-clean inference module based on the original author's repo. 5 | 6 | ## Model Inference 7 | [Colab Demo](https://colab.research.google.com/drive/1dS4chsaFdC1D3vBaIaqIhBNaGSzXtzAc?usp=sharing) 8 | 9 | - This code provides the simple and clean inference code without any needless ones 10 | - It's compatible with TF 2.0 Version 11 | 12 | ### Requirements 13 | * Tensorflow 2.x 14 | * numpy 15 | * scipy 16 | * python_speech_features 17 | 18 | ### How to run inference 19 | ```python 20 | import numpy as np 21 | from inference import VisemeRegressor 22 | 23 | pb_filepath = "./visemenet_frozen.pb" 24 | wav_file_path = "./test_audio.wav" 25 | out_txt_path = "./maya_viseme_outputs.txt" 26 | 27 | viseme_regressor = VisemeRegressor(pb_filepath=pb_filepath) 28 | 29 | viseme_outputs = viseme_regressor.predict_outputs(wav_file_path=wav_file_path) 30 | 31 | np.savetxt(out_txt_path, viseme_outputs, '%.4f') 32 | ``` 33 | 34 | ## How to freeze graph 35 | - This repo does not need bazel-build for "freeze-graph" function 36 | - Thanks to https://github.com/lighttransport/VisemeNet-infer for giving some examples. 37 | 38 | ### Requirements 39 | * Python 3.6.x using ["pyenv"](https://github.com/pyenv/pyenv) 40 | * Tensorflow 1.1.0 41 | 42 | 1. Setup the envs and packages 43 | ```shell 44 | # Install Virtualenv using pyenv 45 | pyenv install 3.6.5 46 | pyenv virtualenv 3.6.5 visemenet-freeze 47 | pyenv activate visemenet-freeze 48 | ``` 49 | ```shell 50 | # Install packages 51 | pip install tensorflow==1.1.0 52 | ``` 53 | 54 | 2. Clone the repo 55 | ```shell 56 | # Clone Visemenet repo and the pretrained model 57 | git clone https://github.com/yzhou359/VisemeNet_tensorflow.git 58 | curl -L https://www.dropbox.com/sh/7nbqgwv0zz8pbk9/AAAghy76GVYDLqPKdANcyDuba?dl=0 > pretrained_model.zip 59 | unzip prtrained_model.zip -d VisemeNet_tensorflow/data/ckpt/pretrain_biwi/ 60 | ``` 61 | 62 | 3. Freeze Graph and Save as pb 63 | ```shell 64 | # Freeze Graph 65 | python freeze_graph.py 66 | ``` 67 | -------------------------------------------------------------------------------- /freeze_graph.py: -------------------------------------------------------------------------------- 1 | # Be Aware: Only supports in Tensorflow 1.1.0 2 | import sys 3 | sys.path.append("./VisemeNet_tensorflow/") 4 | 5 | import tensorflow as tf 6 | from src.model import model 7 | from src.utl.load_param import model_dir 8 | 9 | 10 | def freeze_visemenet_graph(out_path): 11 | model_name='pretrain_biwi' 12 | 13 | with tf.Graph().as_default() as graph: 14 | 15 | init, net1_optim, net2_optim, all_optim, x, x_face_id, y_landmark, \ 16 | y_phoneme, y_lipS, y_maya_param, dropout, cost, tensorboard_op, pred, \ 17 | clear_op, inc_op, avg, batch_size_placeholder, phase = model() 18 | 19 | config = tf.ConfigProto() 20 | config.gpu_options.allow_growth = True 21 | sess = tf.Session(config=config) 22 | max_to_keep = 20 23 | saver = tf.train.Saver(max_to_keep=max_to_keep) 24 | 25 | OLD_CHECKPOINT_FILE = model_dir + model_name + '/' + model_name +'.ckpt' 26 | 27 | saver.restore(sess, OLD_CHECKPOINT_FILE) 28 | print("Model loaded: " + model_dir + model_name) 29 | 30 | ## For debugging 31 | # node_names = [node.name for node in sess.graph_def.node] 32 | # for node_name in node_names: 33 | # if node_name.find("net2_output") != -1: 34 | # print(node_name) 35 | 36 | output_names = ['net2_output/add_1', 'net2_output/add_4', 'net2_output/add_6'] 37 | frozen_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, output_names) 38 | 39 | with tf.gfile.GFile(out_path, 'w') as f: 40 | f.write(frozen_def.SerializeToString()) 41 | 42 | print("Save ProtoBuffer in {}".format(out_path)) 43 | 44 | 45 | if __name__ == '__main__': 46 | out_path = "./visemenet_frozen.pb" 47 | freeze_visemenet_graph(out_path=out_path) 48 | -------------------------------------------------------------------------------- /inference.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import scipy.io.wavfile as wav 4 | from python_speech_features import logfbank, mfcc, ssc 5 | from postprocess import postprocess_model_outputs 6 | 7 | 8 | class VisemeRegressor(object): 9 | def __init__(self, pb_filepath): 10 | # Load forzen graph 11 | self.pb_filepath = pb_filepath 12 | self.graph = self._load_graph(self.pb_filepath) 13 | 14 | # Define Hpyer-params 15 | ## Sampling 16 | self.fps = 25 17 | self.mfcc_win_step_per_frame = 1 18 | self.up_sample_rate = 4 19 | self.win_length = 0.025 20 | self.winstep = 1.0 / self.fps / self.mfcc_win_step_per_frame / self.up_sample_rate 21 | self.window_size = 24 22 | 23 | ## Num Signal features 24 | self.num_mfcc = 13 25 | self.num_logfbank = 26 26 | self.num_ssc = 26 27 | self.num_total_features = 65 28 | 29 | ## Model Params 30 | self.n_steps = 8 31 | self.n_input = int(self.num_total_features * self.mfcc_win_step_per_frame * self.window_size / self.n_steps) 32 | self.n_landmark = 76 33 | self.n_face_id = 76 34 | self.n_phoneme = 21 35 | self.n_maya_params = 22 36 | 37 | def predict_outputs(self, wav_file_path, mean_std_csv_path='./saved_params/wav_mean_std.csv', close_face_txt_path='./saved_params/maya_close_face.txt'): 38 | # Define Input 39 | ## Preprocess wav file 40 | concat_feat = self._preprocess_wav( 41 | wav_file_path=wav_file_path, is_debug=False 42 | ) 43 | normalized_feat = self._normalize_input( 44 | concat_features=concat_feat, mean_std_csv_path=mean_std_csv_path 45 | ) 46 | target_wav_idxs = self._get_padded_indexes( 47 | normalized_feat=normalized_feat, window_size=self.window_size 48 | ) 49 | ## Prepare model input 50 | batch_size = concat_feat.shape[0] # Num Frames 51 | batch_x, batch_x_face_id = self._prepare_model_input( 52 | normalized_feat=normalized_feat, 53 | target_wav_idxs=target_wav_idxs, 54 | batch_size=batch_size, 55 | close_face_txt_path=close_face_txt_path 56 | ) 57 | 58 | # Predict Outputs 59 | ## Input nodes 60 | x = self.graph.get_tensor_by_name('input/Placeholder_1:0') 61 | x_face_id = self.graph.get_tensor_by_name('input/Placeholder_2:0') 62 | phase = self.graph.get_tensor_by_name('input/phase:0') 63 | dropout = self.graph.get_tensor_by_name('net1_shared_rnn/Placeholder:0') 64 | 65 | ## Output nodes 66 | v_cls = self.graph.get_tensor_by_name('net2_output/add_1:0') 67 | v_reg = self.graph.get_tensor_by_name('net2_output/add_4:0') 68 | jali = self.graph.get_tensor_by_name('net2_output/add_6:0') 69 | 70 | with tf.compat.v1.Session(graph=self.graph) as sess: 71 | pred_v_cls, pred_v_reg, pred_jali = sess.run( 72 | [v_cls, v_reg, jali], 73 | feed_dict={ 74 | x: batch_x, 75 | x_face_id: batch_x_face_id, 76 | dropout: 0, phase: 0 77 | } 78 | ) 79 | pred_v_cls = self.sigmoid(pred_v_cls) 80 | 81 | # Postprocess Outputs - Smoothing and Clip based on the pre-calculated thresholds 82 | cls_output = np.concatenate([pred_jali, pred_v_cls], axis=1) 83 | reg_output = np.concatenate([pred_jali, pred_v_reg], axis=1) 84 | 85 | viseme_outputs = postprocess_model_outputs( 86 | reg_output=reg_output, cls_output=cls_output 87 | ) 88 | 89 | return viseme_outputs 90 | 91 | def _prepare_model_input(self, normalized_feat, target_wav_idxs, batch_size, close_face_txt_path): 92 | batch_x = np.zeros((batch_size, self.n_steps, self.n_input)) 93 | batch_x_face_id = np.zeros((batch_size, self.n_face_id)) 94 | # batch_x_pose = np.zeros((batch_size, 3)) 95 | # batch_y_landmark = np.zeros((batch_size, self.n_landmark)) 96 | # batch_y_phoneme = np.zeros((batch_size, self.n_phoneme)) 97 | # batch_y_lipS = np.zeros((batch_size, 1)) 98 | # batch_y_maya_param = np.zeros((batch_size, self.n_maya_params)) 99 | 100 | for i in range(0, batch_size): 101 | batch_x[i] = normalized_feat[target_wav_idxs[i]].reshape((-1, self.n_steps, self.n_input)) 102 | 103 | close_face = np.loadtxt(close_face_txt_path) 104 | batch_x_face_id = np.tile(close_face, (batch_size, 1)) 105 | 106 | return batch_x, batch_x_face_id 107 | 108 | def _get_padded_indexes(self, normalized_feat, window_size): 109 | # Get Padded indexes based on the given window size 110 | num_frames = normalized_feat.shape[0] 111 | wav_idxs = [i for i in range(0, num_frames)] 112 | 113 | half_win_size = window_size // 2 114 | pad_head = [0 for _ in range(half_win_size)] 115 | pad_tail = [wav_idxs[-1] for _ in range(half_win_size)] 116 | padded_idxs = np.array(pad_head + wav_idxs + pad_tail) 117 | 118 | target_wav_idxs = np.zeros(shape=(num_frames, window_size)).astype(int) 119 | for i in range(0, num_frames): 120 | target_wav_idxs[i] = padded_idxs[i:i+window_size].reshape(-1, window_size) 121 | 122 | return target_wav_idxs 123 | 124 | def _normalize_input(self, concat_features, mean_std_csv_path): 125 | # Normalize input using the pre-calculated mean, std values 126 | num_features = self.num_mfcc + self.num_logfbank + self.num_ssc 127 | 128 | mean_std = np.loadtxt(mean_std_csv_path) 129 | mean_vals = mean_std[:num_features] 130 | std_vals = mean_std[num_features:] 131 | 132 | normalized_feat = (concat_features - mean_vals) / std_vals 133 | 134 | return normalized_feat 135 | 136 | def _preprocess_wav(self, wav_file_path, is_debug=False): 137 | sample_rate, signal = wav.read(wav_file_path) 138 | 139 | if (signal.ndim > 1): 140 | signal = signal[:, 0] 141 | 142 | # Get concatentated features 143 | ## 1. mfcc_features 144 | mfcc_feat = mfcc( 145 | signal, numcep=self.num_mfcc, 146 | samplerate=sample_rate, 147 | winlen=self.win_length, winstep=self.winstep 148 | ) 149 | 150 | ## 2. logfbank_features 151 | logfbank_feat = logfbank( 152 | signal, nfilt=self.num_logfbank, 153 | samplerate=sample_rate, 154 | winlen=self.win_length, winstep=self.winstep 155 | ) 156 | 157 | ## 3. ssc_features 158 | ssc_feat = ssc( 159 | signal, nfilt=self.num_ssc, 160 | samplerate=sample_rate, 161 | winlen=self.win_length, winstep=self.winstep 162 | ) 163 | 164 | concat_features = np.concatenate( 165 | [mfcc_feat, logfbank_feat, ssc_feat], axis=1 166 | ) 167 | 168 | target_frames = int(concat_features.shape[0] / self.mfcc_win_step_per_frame / self.up_sample_rate) 169 | mfcc_lines = concat_features[:target_frames * self.mfcc_win_step_per_frame * self.up_sample_rate] 170 | 171 | if is_debug: 172 | print("Sample Rate: {}".format(sample_rate)) 173 | print("Signal Shape: {}".format(signal.shape)) 174 | print("") 175 | print("Collect Features") 176 | print("[mfcc feat shape]: {}".format(mfcc_feat.shape)) 177 | print("[logfbank feat shape]: {}".format(logfbank_feat.shape)) 178 | print("[ssc feat shape]: {}".format(ssc_feat.shape)) 179 | print("--> Concat Features Shape: {}".format(concat_features.shape)) 180 | 181 | return mfcc_lines 182 | 183 | def _load_graph(self, pb_filepath): 184 | with tf.io.gfile.GFile(pb_filepath, 'rb') as f: 185 | graph_def = tf.compat.v1.GraphDef() 186 | graph_def.ParseFromString(f.read()) 187 | 188 | with tf.Graph().as_default() as graph: 189 | tf.import_graph_def(graph_def, name='') 190 | 191 | for op in graph.get_operations(): 192 | if op.type == 'Placeholder': 193 | print(op.name) 194 | 195 | return graph 196 | 197 | def sigmoid(self, x): 198 | return 1/(1+np.exp(-x)) 199 | -------------------------------------------------------------------------------- /postprocess.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy as sp 3 | 4 | MAYA_PHONEME_NAMES = ['Ah', 'Aa', 'Eh', 'Ee', 'Ih', 'Oh', 'Uh', 'U', 'Eu', 5 | 'Schwa', 'R', 'S', 'ShChZh', 'Th', 6 | 'JY', 'LNTD', 'GK', 'MBP', 'FV', 'W'] 7 | 8 | PHONEME_THRESHOLD = np.array([0.12, 0.23, 0.18, 0.02, 10, 0.19, 0.18, 0.05, 10, 0.16, 9 | 0.18, 0.29, 0.29, 0.27, 10, 10, 10, 0.004, 0.29, 0.16]) 10 | 11 | # PHONEME_THRESHOLD = np.array([0.35, 0.23, 0.18, 0.17, 10, 0.19, 0.18, 0.19, 10, 0.16, 12 | # 0.18, 0.29, 0.29, 0.27, 10, 10, 10, 0.004, 0.29, 0.16]) # perfect 13 | 14 | def smooth(x, window_len, window='hanning'): 15 | if window_len < 3: 16 | return x 17 | 18 | if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']: 19 | raise(ValueError, "Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'") 20 | 21 | s = np.r_[x[window_len - 1:0:-1], x, x[-2:-window_len - 1:-1]] 22 | 23 | if window == 'flat': # moving average 24 | w = np.ones(window_len, 'd') 25 | else: 26 | w = eval('np.' + window + '(window_len)') 27 | 28 | y = np.convolve(w / w.sum(), s, mode='valid') 29 | 30 | return y 31 | 32 | 33 | def postprocess_model_outputs(reg_output, cls_output): 34 | """ Postproces raw outputs of the VisemeNet. 35 | Args: 36 | reg_output: Shape as (Num Frames, 22) 37 | cls_output: Shape as (Num Frames, 22) 38 | Return: 39 | viseme_outputs: JALI based Lip blendshapes coefficients 40 | """ 41 | assert reg_output.shape == cls_output.shape 42 | 43 | num_frames, num_maya_params = reg_output.shape 44 | num_translate = 2 45 | for i in range(num_translate, num_maya_params): 46 | # Cls. output 47 | cls_output[2:-3, i] = sp.signal.medfilt(cls_output[2:-3, i], kernel_size=[9]) 48 | cls_output[:, i] = smooth(cls_output[:, i], window_len=9)[4:-4] 49 | # Reg. output 50 | reg_output[:, i] = sp.signal.medfilt(reg_output[:, i], kernel_size=[9]) 51 | reg_output[:, i] = smooth(reg_output[:, i], window_len=9)[4:-4] 52 | 53 | viseme_outputs = np.zeros_like(cls_output) 54 | viseme_outputs[:, 0] = smooth(cls_output[:, 0], window_len=15)[7:-7] 55 | viseme_outputs[:, 1] = smooth(cls_output[:, 1], window_len=15)[7:-7] 56 | 57 | for i in range(num_translate, num_maya_params): 58 | tmp = cls_output[:, i] * reg_output[:, i] 59 | l_idx = tmp > PHONEME_THRESHOLD[i-2] 60 | viseme_outputs[l_idx, i] = reg_output[l_idx, i] 61 | 62 | viseme_outputs[:, i] = smooth(viseme_outputs[:, i], window_len=15)[7:-7] 63 | 64 | r = 0 65 | while r < viseme_outputs.shape[0]: 66 | if viseme_outputs[r, i] > 0.1: 67 | active_begin = r 68 | for r2 in range(r, viseme_outputs.shape[0]): 69 | if viseme_outputs[r2, i] < 0.1 or r2 == viseme_outputs.shape[0] - 1: 70 | active_end = r2 71 | r = r2 72 | break 73 | 74 | if (active_begin == active_end): 75 | break 76 | max_reg = np.max(reg_output[active_begin:active_end, i]) 77 | max_pred = np.max(viseme_outputs[active_begin:active_end, i]) 78 | rate = max_reg / max_pred 79 | viseme_outputs[active_begin:active_end, i] = viseme_outputs[active_begin:active_end, i] * rate 80 | r += 1 81 | viseme_outputs[:, i] = smooth(viseme_outputs[:, i], 15)[7:-7] 82 | 83 | r = 0 84 | while r < viseme_outputs.shape[0]: 85 | if viseme_outputs[r, i] > 0.1: 86 | active_begin = r 87 | for r2 in range(r, viseme_outputs.shape[0]): 88 | if viseme_outputs[r2, i] < 0.1 or r2 == viseme_outputs.shape[0] - 1: 89 | active_end = r2 90 | r = r2 91 | break 92 | 93 | max_reg = np.max(reg_output[active_begin:active_end, i]) 94 | if(i==19 or i==20 or i==21): 95 | if(max_reg>0.7): 96 | max_reg = 1 97 | max_pred = np.max(viseme_outputs[active_begin:active_end, i]) 98 | rate = max_reg / max_pred 99 | viseme_outputs[active_begin:active_end, i] = viseme_outputs[active_begin:active_end, i] * rate 100 | r += 1 101 | 102 | return viseme_outputs 103 | -------------------------------------------------------------------------------- /saved_params/maya_close_face.txt: -------------------------------------------------------------------------------- 1 | -0.63522 -0.63037 -0.43071 -0.80431 -0.21347 -0.95468 -0.00000 -0.98242 0.21347 -0.95468 0.43071 -0.80456 0.63522 -0.63037 -0.20890 0.05236 -0.11252 0.02298 0.00000 0.00000 0.11252 0.02211 0.20890 0.05198 -0.05739 -0.20821 -0.16401 -0.21268 -0.22784 -0.23076 -0.32045 -0.27540 -0.22764 -0.42029 -0.10126 -0.46787 -0.00000 -0.48066 0.10126 -0.46797 0.22764 -0.42056 0.33245 -0.27810 0.22784 -0.23076 0.16401 -0.21282 0.05739 -0.20839 0.00000 -0.23967 -0.17054 -0.30394 -0.28709 -0.28996 -0.20255 -0.30883 -0.08933 -0.32174 -0.00000 -0.32644 0.08933 -0.32183 0.20255 -0.30948 0.29165 -0.28881 0.17054 -0.30394 0.06859 -0.31126 -0.00000 -0.32164 -0.06859 -0.31113 2 | -------------------------------------------------------------------------------- /saved_params/wav_mean_std.csv: -------------------------------------------------------------------------------- 1 | 16.68886 2 | 3.97479 3 | -20.30682 4 | 5.80699 5 | -15.43429 6 | 5.92388 7 | -28.63684 8 | 4.14146 9 | -19.75630 10 | -1.72316 11 | -10.24419 12 | 5.74067 13 | -3.86981 14 | 9.38637 15 | 10.30149 16 | 11.62886 17 | 12.49301 18 | 12.46678 19 | 11.90186 20 | 11.84886 21 | 11.95487 22 | 12.28070 23 | 12.54658 24 | 12.54819 25 | 12.67694 26 | 12.96691 27 | 12.98765 28 | 12.97866 29 | 12.62430 30 | 12.24909 31 | 11.83247 32 | 11.93958 33 | 11.98562 34 | 12.08022 35 | 11.97421 36 | 11.62867 37 | 10.93464 38 | 8.95501 39 | 7.04081 40 | 87.12891 41 | 173.25781 42 | 292.47222 43 | 440.17648 44 | 596.93069 45 | 769.05987 46 | 974.09208 47 | 1206.45156 48 | 1504.64536 49 | 1807.83956 50 | 2177.74605 51 | 2595.55055 52 | 3045.21528 53 | 3550.92647 54 | 4109.56849 55 | 4749.31868 56 | 5456.19422 57 | 6408.52870 58 | 7440.71526 59 | 8554.42553 60 | 9794.58080 61 | 11161.64710 62 | 12757.65047 63 | 14405.03992 64 | 15984.00054 65 | 19289.26037 66 | 2.63713 67 | 15.46299 68 | 13.12569 69 | 17.44463 70 | 16.31907 71 | 16.44532 72 | 17.10880 73 | 17.89937 74 | 16.92003 75 | 15.19639 76 | 14.47684 77 | 13.30815 78 | 12.08762 79 | 2.62696 80 | 2.75437 81 | 2.83862 82 | 3.02088 83 | 3.12501 84 | 3.06590 85 | 2.99671 86 | 2.98436 87 | 2.99794 88 | 3.01999 89 | 2.97766 90 | 2.96631 91 | 3.07834 92 | 2.99264 93 | 2.93484 94 | 2.91424 95 | 2.81031 96 | 2.71607 97 | 2.81370 98 | 2.88491 99 | 2.96294 100 | 3.01507 101 | 3.03082 102 | 2.86959 103 | 2.59370 104 | 2.74709 105 | 1000.00000 106 | 1000.00000 107 | 26.24620 108 | 36.81842 109 | 38.78619 110 | 35.94704 111 | 41.87735 112 | 51.41814 113 | 68.31984 114 | 78.18270 115 | 85.86550 116 | 94.36517 117 | 99.69723 118 | 118.70094 119 | 136.14025 120 | 141.08837 121 | 179.08942 122 | 178.79265 123 | 180.67445 124 | 197.02229 125 | 204.96015 126 | 266.20027 127 | 254.49708 128 | 270.17589 129 | 540.48172 130 | 192.19627 131 | -------------------------------------------------------------------------------- /test_audio/coffee_xxxx.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junhwanjang/visemenet-inference/c20ec783694bf326255f5369d6db2fdb4817954a/test_audio/coffee_xxxx.wav -------------------------------------------------------------------------------- /test_audio/visemenet_intro.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junhwanjang/visemenet-inference/c20ec783694bf326255f5369d6db2fdb4817954a/test_audio/visemenet_intro.wav -------------------------------------------------------------------------------- /visemenet_frozen.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junhwanjang/visemenet-inference/c20ec783694bf326255f5369d6db2fdb4817954a/visemenet_frozen.pb --------------------------------------------------------------------------------