├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── Add dendro-group cats.ipynb ├── CODE_OF_CONDUCT.md ├── Category_colors.ipynb ├── Filter using names.ipynb ├── Fix Enrichrgram category coloring.ipynb ├── Improved sim-mat control.ipynb ├── LICENSE ├── MANIFEST ├── Modify downsample.ipynb ├── README.md ├── RELEASE.md ├── Row filtering based on original data.ipynb ├── Test net updating.ipynb ├── Widget_View_Downsample.ipynb ├── add_cats method.ipynb ├── add_enrichr_cats.ipynb ├── clustergrammer ├── __init__.py ├── calc_clust.py ├── cat_pval.py ├── categories.py ├── data_formats.py ├── downsample_fun.py ├── enrichr_functions.py ├── export_data.py ├── iframe_web_app.py ├── initialize_net.py ├── load_data.py ├── load_vect_post.py ├── make_clust_fun.py ├── make_sim_mat.py ├── make_unique_labels.py ├── make_views.py ├── make_viz.py ├── normalize_fun.py ├── proc_df_labels.py └── run_filter.py ├── json ├── mult_view.json ├── mult_view_sim_col.json └── mult_view_sim_row.json ├── make_clustergrammer.py ├── make_stdin_stdout.py ├── python27 new import.ipynb ├── python35_new_import.ipynb ├── setup.cfg ├── setup.py └── txt ├── example_tsv.txt ├── rc_ptms.txt ├── rc_two_cats.txt └── rc_val_cats.txt /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled source # 2 | ################### 3 | *.com 4 | *.class 5 | *.dll 6 | *.exe 7 | *.o 8 | *.so 9 | 10 | # Packages # 11 | ############ 12 | # it's better to unpack these files and commit the raw source 13 | # git has its own built in compression methods 14 | *.7z 15 | *.dmg 16 | *.gz 17 | *.iso 18 | *.jar 19 | *.rar 20 | *.tar 21 | *.zip 22 | node_modules 23 | 24 | # Logs and databases # 25 | ###################### 26 | *.log 27 | *.sql 28 | *.sqlite 29 | 30 | # OS generated files # 31 | ###################### 32 | .DS_Store 33 | .DS_Store? 34 | ._* 35 | .Spotlight-V100 36 | .Trashes 37 | ehthumbs.db 38 | Thumbs.db 39 | 40 | # cache files for sublime text 41 | *.tmlanguage.cache 42 | *.tmPreferences.cache 43 | *.stTheme.cache 44 | 45 | # workspace files are user-specific 46 | *.sublime-workspace 47 | *.sublime-project 48 | *.idea 49 | *.swo 50 | *.swp 51 | 52 | # sftp configuration file 53 | sftp-config.json 54 | 55 | #TernJS 56 | .tern-port 57 | 58 | # webpack 59 | *.js.map 60 | 61 | # python 62 | *.ipynb_checkpoints 63 | *.pyc 64 | 65 | # # eslint 66 | # .eslint* 67 | 68 | # how to retroactively use 69 | ############################ 70 | # git rm -r --cached . 71 | # git add . 72 | # git commit -m "fixing .gitignore" 73 | 74 | 75 | txt/ds_* -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at nicolas.fernandez@mssm.edu. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /Category_colors.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false, 8 | "deletable": true, 9 | "editable": true 10 | }, 11 | "outputs": [], 12 | "source": [ 13 | "import pandas as pd\n", 14 | "import numpy as np\n", 15 | "from clustergrammer import Network\n", 16 | "net = Network()" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": { 23 | "collapsed": false, 24 | "deletable": true, 25 | "editable": true 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "net.load_file('txt/ds_plasma.txt')" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": { 36 | "collapsed": false 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "net.load_file('txt/ds_plasma.txt')" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": { 47 | "collapsed": false, 48 | "deletable": true, 49 | "editable": true 50 | }, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/plain": [ 55 | "{'col': {'cat-0': {'Marker-type: phospho marker': '#17becf',\n", 56 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 57 | " 'row': {'cat-0': {'Majority-Treatment: Plasma': '#dbdb8d'}, 'cat-1': {}}}" 58 | ] 59 | }, 60 | "execution_count": 4, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "net.viz['cat_colors']" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": { 73 | "collapsed": false 74 | }, 75 | "outputs": [], 76 | "source": [] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 5, 81 | "metadata": { 82 | "collapsed": false 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "net.load_file('txt/ds_pma.txt')" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 6, 92 | "metadata": { 93 | "collapsed": false 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "df_pma = net.export_df()" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 7, 103 | "metadata": { 104 | "collapsed": false 105 | }, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "{'col': {'cat-0': {'Marker-type: phospho marker': '#17becf',\n", 111 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 112 | " 'row': {'cat-0': {'Majority-Treatment: PMA': '#c5b0d5',\n", 113 | " 'Majority-Treatment: Plasma': '#dbdb8d'},\n", 114 | " 'cat-1': {}}}" 115 | ] 116 | }, 117 | "execution_count": 7, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "net.viz['cat_colors']" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 8, 129 | "metadata": { 130 | "collapsed": false 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "# generate random matrix\n", 135 | "num_rows = 500\n", 136 | "num_cols = 10\n", 137 | "np.random.seed(seed=100)\n", 138 | "mat = np.random.rand(num_rows, num_cols)\n", 139 | "\n", 140 | "# make row and col labels\n", 141 | "rows = range(num_rows)\n", 142 | "cols = range(num_cols)\n", 143 | "rows = [str(i) for i in rows]\n", 144 | "cols = [str(i) for i in cols]\n", 145 | "\n", 146 | "# make dataframe \n", 147 | "df = pd.DataFrame(data=mat, columns=cols, index=rows)" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 9, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "{'col': {'cat-0': {'Marker-type: phospho marker': '#17becf',\n", 161 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 162 | " 'row': {'cat-0': {'Majority-Treatment: PMA': '#c5b0d5',\n", 163 | " 'Majority-Treatment: Plasma': '#dbdb8d'},\n", 164 | " 'cat-1': {}}}" 165 | ] 166 | }, 167 | "execution_count": 9, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | } 171 | ], 172 | "source": [ 173 | "net.viz['cat_colors']" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 10, 179 | "metadata": { 180 | "collapsed": true 181 | }, 182 | "outputs": [], 183 | "source": [ 184 | "net.load_df(df)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 11, 190 | "metadata": { 191 | "collapsed": false 192 | }, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/plain": [ 197 | "{'col': {'cat-0': {'Marker-type: phospho marker': '#17becf',\n", 198 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 199 | " 'row': {'cat-0': {'Majority-Treatment: PMA': '#c5b0d5',\n", 200 | " 'Majority-Treatment: Plasma': '#dbdb8d'},\n", 201 | " 'cat-1': {}}}" 202 | ] 203 | }, 204 | "execution_count": 11, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "net.viz['cat_colors']" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": { 217 | "collapsed": true 218 | }, 219 | "outputs": [], 220 | "source": [] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": { 226 | "collapsed": true 227 | }, 228 | "outputs": [], 229 | "source": [] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 12, 234 | "metadata": { 235 | "collapsed": false, 236 | "deletable": true, 237 | "editable": true 238 | }, 239 | "outputs": [], 240 | "source": [ 241 | "net.set_cat_color('col', 1, 'Category: one', 'blue')" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 13, 247 | "metadata": { 248 | "collapsed": false, 249 | "deletable": true, 250 | "editable": true 251 | }, 252 | "outputs": [ 253 | { 254 | "data": { 255 | "text/plain": [ 256 | "{'col': {'cat-0': {'Category: one': 'blue',\n", 257 | " 'Marker-type: phospho marker': '#17becf',\n", 258 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 259 | " 'row': {'cat-0': {'Majority-Treatment: PMA': '#c5b0d5',\n", 260 | " 'Majority-Treatment: Plasma': '#dbdb8d'},\n", 261 | " 'cat-1': {}}}" 262 | ] 263 | }, 264 | "execution_count": 13, 265 | "metadata": {}, 266 | "output_type": "execute_result" 267 | } 268 | ], 269 | "source": [ 270 | "net.viz['cat_colors']" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 14, 276 | "metadata": { 277 | "collapsed": true, 278 | "deletable": true, 279 | "editable": true 280 | }, 281 | "outputs": [], 282 | "source": [ 283 | "df = net.export_df()" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 15, 289 | "metadata": { 290 | "collapsed": false 291 | }, 292 | "outputs": [], 293 | "source": [ 294 | "net.load_df(df)\n", 295 | "df = df.transpose()" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 16, 301 | "metadata": { 302 | "collapsed": false 303 | }, 304 | "outputs": [ 305 | { 306 | "data": { 307 | "text/plain": [ 308 | "{'col': {'cat-0': {'Category: one': 'blue',\n", 309 | " 'Marker-type: phospho marker': '#17becf',\n", 310 | " 'Marker-type: surface marker': '#6b6ecf'}},\n", 311 | " 'row': {'cat-0': {'Majority-Treatment: PMA': '#c5b0d5',\n", 312 | " 'Majority-Treatment: Plasma': '#dbdb8d'},\n", 313 | " 'cat-1': {}}}" 314 | ] 315 | }, 316 | "execution_count": 16, 317 | "metadata": {}, 318 | "output_type": "execute_result" 319 | } 320 | ], 321 | "source": [ 322 | "net.viz['cat_colors']" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "collapsed": true 330 | }, 331 | "outputs": [], 332 | "source": [] 333 | } 334 | ], 335 | "metadata": { 336 | "kernelspec": { 337 | "display_name": "Python 2", 338 | "language": "python", 339 | "name": "python2" 340 | }, 341 | "language_info": { 342 | "codemirror_mode": { 343 | "name": "ipython", 344 | "version": 2 345 | }, 346 | "file_extension": ".py", 347 | "mimetype": "text/x-python", 348 | "name": "python", 349 | "nbconvert_exporter": "python", 350 | "pygments_lexer": "ipython2", 351 | "version": "2.7.12" 352 | } 353 | }, 354 | "nbformat": 4, 355 | "nbformat_minor": 2 356 | } 357 | -------------------------------------------------------------------------------- /Filter using names.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "from clustergrammer import Network\n", 12 | "from clustergrammer_widget import clustergrammer_widget\n", 13 | "net = Network()" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 2, 19 | "metadata": { 20 | "collapsed": false 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "net.load_file('txt/rc_two_cats.txt')" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 3, 30 | "metadata": { 31 | "collapsed": false 32 | }, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "filter_names\n", 39 | "['ROS1', 'AAK1']\n", 40 | "[('Gene: AAK1', 'Gene Type: Not Interesting'), ('Gene: ROS1', 'Gene Type: Interesting')]\n" 41 | ] 42 | } 43 | ], 44 | "source": [ 45 | "net.filter_names('row', ['ROS1', 'AAK1'])" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 4, 51 | "metadata": { 52 | "collapsed": false 53 | }, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/plain": [ 58 | "(2, 29)" 59 | ] 60 | }, 61 | "execution_count": 4, 62 | "metadata": {}, 63 | "output_type": "execute_result" 64 | } 65 | ], 66 | "source": [ 67 | "net.dat['mat'].shape" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 5, 73 | "metadata": { 74 | "collapsed": false 75 | }, 76 | "outputs": [ 77 | { 78 | "name": "stdout", 79 | "output_type": "stream", 80 | "text": [ 81 | "filter_names\n", 82 | "['H1781', 'H661']\n", 83 | "[('Cell Line: H661', 'Category: five', 'Gender: Male'), ('Cell Line: H1781', 'Category: one', 'Gender: Female')]\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "net.filter_names('col', ['H1781', 'H661'])" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 8, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "net.dat['mat'].shape\n", 100 | "net.make_clust()" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 9, 106 | "metadata": { 107 | "collapsed": false 108 | }, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "application/vnd.jupyter.widget-view+json": { 113 | "model_id": "79b708096ef9427baa842f5e37f6622b" 114 | } 115 | }, 116 | "metadata": {}, 117 | "output_type": "display_data" 118 | } 119 | ], 120 | "source": [ 121 | "clustergrammer_widget(network=net.widget())" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "collapsed": true 129 | }, 130 | "outputs": [], 131 | "source": [] 132 | } 133 | ], 134 | "metadata": { 135 | "kernelspec": { 136 | "display_name": "Python [Root]", 137 | "language": "python", 138 | "name": "Python [Root]" 139 | }, 140 | "language_info": { 141 | "codemirror_mode": { 142 | "name": "ipython", 143 | "version": 2 144 | }, 145 | "file_extension": ".py", 146 | "mimetype": "text/x-python", 147 | "name": "python", 148 | "nbconvert_exporter": "python", 149 | "pygments_lexer": "ipython2", 150 | "version": "2.7.12" 151 | } 152 | }, 153 | "nbformat": 4, 154 | "nbformat_minor": 2 155 | } 156 | -------------------------------------------------------------------------------- /Fix Enrichrgram category coloring.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [] 11 | } 12 | ], 13 | "metadata": { 14 | "kernelspec": { 15 | "display_name": "Python [Root]", 16 | "language": "python", 17 | "name": "Python [Root]" 18 | }, 19 | "language_info": { 20 | "codemirror_mode": { 21 | "name": "ipython", 22 | "version": 2 23 | }, 24 | "file_extension": ".py", 25 | "mimetype": "text/x-python", 26 | "name": "python", 27 | "nbconvert_exporter": "python", 28 | "pygments_lexer": "ipython2", 29 | "version": "2.7.12" 30 | } 31 | }, 32 | "nbformat": 4, 33 | "nbformat_minor": 2 34 | } 35 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Nicolas Fernandez 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | # file GENERATED by distutils, do NOT edit 2 | setup.cfg 3 | setup.py 4 | clustergrammer/__init__.py 5 | clustergrammer/calc_clust.py 6 | clustergrammer/cat_pval.py 7 | clustergrammer/categories.py 8 | clustergrammer/data_formats.py 9 | clustergrammer/downsample_fun.py 10 | clustergrammer/enrichr_functions.py 11 | clustergrammer/export_data.py 12 | clustergrammer/iframe_web_app.py 13 | clustergrammer/initialize_net.py 14 | clustergrammer/load_data.py 15 | clustergrammer/load_vect_post.py 16 | clustergrammer/make_clust_fun.py 17 | clustergrammer/make_sim_mat.py 18 | clustergrammer/make_unique_labels.py 19 | clustergrammer/make_views.py 20 | clustergrammer/make_viz.py 21 | clustergrammer/normalize_fun.py 22 | clustergrammer/proc_df_labels.py 23 | clustergrammer/run_filter.py 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Clustergrammer Python Module 2 | The python module [clutergrammer.py](clustergrammer), takes a tab-separated matrix file as input (see format [here](#input-matrix-format)), calculates clustering, and generates the visualization json (see format [here](https://github.com/MaayanLab/clustergrammer-json)) for [clustergrammer.js](https://github.com/MaayanLab/clustergrammer). See an [example workflow](#example-workflow) below: 3 | 4 | 5 | Pleae see Clustergramer-PY's [documentation](http://clustergrammer.readthedocs.io/clustergrammer_py.html) for more information. 6 | 7 | ## Installation 8 | The module can be used by downloading the source code here or by installing with [pip](https://pypi.python.org/pypi?:action=display&name=clustergrammer): 9 | 10 | ``` 11 | # python 2 12 | $ pip install clustergrammer 13 | 14 | # python 3 15 | $ pip3 install clustergrammer 16 | ``` 17 | 18 | ## Example Workflow 19 | ``` 20 | from clustergrammer import Network 21 | net = Network() 22 | 23 | # load matrix file 24 | net.load_file('txt/rc_two_cats.txt') 25 | 26 | # calculate clustering 27 | net.make_clust(dist_type='cos',views=['N_row_sum', 'N_row_var']) 28 | 29 | # write visualization json to file 30 | net.write_json_to_file('viz', 'json/mult_view.json') 31 | ``` 32 | The script [make_clustergrammer.py](make_clustergrammer.py) is used to generate the visualization jsons (see [json](https://github.com/MaayanLab/clustergrammer/tree/master/json) directory of the clustergrammer repo) for the examples pages on the [clustergrammer](https://github.com/MaayanLab/clustergrammer) repo. To visualize your own data modify the [make_clustergrammer.py](make_clustergrammer.py) script on the [clustergrammer](https://github.com/MaayanLab/clustergrammer) repo. 33 | 34 | ## Jupyter Notebook Examples 35 | 36 | ### Clustergrammer-Widget Example 37 | Clustergrammer can be used as a notebook extension widget. To install the widget use 38 | 39 | ``` 40 | # python 2 41 | $ pip install clustergrammer_widget 42 | 43 | # python 3 44 | $ pip3 install clustergrammer_widget 45 | ``` 46 | 47 | Within the Jupyter/IPython notebook the widget can be run using the following commands 48 | 49 | ``` 50 | # import the widget 51 | from clustergrammer_widget import * 52 | from copy import deepcopy 53 | 54 | # load data into new network instance and cluster 55 | net = deepcopy(Network()) 56 | net.load_file('rc_two_cats.txt') 57 | net.make_clust() 58 | 59 | # view the results as a widget 60 | clustergrammer_notebook(network = net.export_net_json()) 61 | ``` 62 | 63 | The [clustergrammer_widget](https://github.com/MaayanLab/clustergrammer-widget) repo contains the source code for the widget. 64 | 65 | ### IFrame Clustergrammer-web Results 66 | The python module can make an IFramed visualization in Jupyter/Ipython Python notebooks. See [Jupyter_Notebook_Example.ipynb](Jupyter_Notebook_Example.ipynb) for and example notebook or the example workflow below: 67 | 68 | ``` 69 | # upload a file to the clustergrammer web app and visualize using an Iframe 70 | from clustergrammer import Network 71 | from copy import deepcopy 72 | net = deepcopy(Network()) 73 | link = net.Iframe_web_app('txt/rc_two_cats.txt') 74 | print(link) 75 | ``` 76 | 77 | ## Clustergrammer Python Module API 78 | The python module, [clustergrammer.py](clustergrammer), allows users to upload a matrix, normalize or filter data, and make a visualization json for clustergrammer.js. 79 | 80 | The python module works in the following way. First, data is loaded into a data state (net.dat). Second, a clustered visualization json is calculated and saved in the viz state (net.viz). Third, the visualization object is exported as a json for clustergrammer.js. These three steps are shown in the [example workflow](#example-workflow) as: ```net.load_file```, ```net.make_clust```, and ```net.write_json_to_file```. 81 | 82 | The data state is similar to a Pandas Data Frame. A matrix also can be loaded directly as a [Data Frame](#df_to_dat) or [exported](#dat_to_df). 83 | 84 | Below are the available functions in the ```Network``` object: 85 | 86 | ##### ```load_file(filename)``` 87 | Load a tsv file, given by filename, into the ```Network``` object (stored as ```net.dat```). 88 | 89 | ##### ```load_tsv_to_net(file_buffer)``` 90 | Load a file buffer directly into the ```Network``` object. 91 | 92 | ##### ```df_to_dat()``` 93 | This function loads a Pandas Data Frame into the ```net.dat``` state. This allows a user to directly load a Data Frame rather than have to load from a file. 94 | 95 | ##### ```swap_nan_for_zero()``` 96 | Swap all NaNs in a matrix for zeros. 97 | 98 | ##### ```filter_sum(inst_rc, threshold, take_abs=True)``` 99 | This is a filtering function that can be run before ```make_clust``` that performs a permanent filtering on rows/columns based on their sum. For instance, to filter the matrix to only include rows with a sum above a threshold, 100, do the following: ```net.filter_sum('row', threshold=100)```. Additional, filtered views can also be added using the ```views``` argument in ```make_clust```. 100 | 101 | ##### ```filter_N_top(inst_rc, N_top, rank_type='sum')``` 102 | This is a filtering function that can be run before ```make_clust``` that performs a permanent filtering on rows/columns based on their sum/variance and return the top ```N``` rows/columns with the greatest (absolute value) sum or variance. For instance, to filter a matrix with >100 rows down to the top 100 rows based on their sum do the following: ```net.filter_N_top('row', N_top=100, rank_type='sum')```. This is useful for pre-filtering very large matrices to make them easier to visualize. 103 | 104 | ##### ```filter_threshold(inst_rc, threshold, num_occur)``` 105 | This is a filtering function that can be run before ```make_clust``` that performs a permanent filterin on rows/columns based on whether ```num_occur``` of their values have an absolute value greater than ```threshold```. For instance, to filter a matrix to only include rows that have at least 3 values with an absolute value above 10 do the following: ```net.filter_threshold('row', threshold=3, num_occur=10)```. This is useful for filtering rows/columns that have the same or simlar sums and variances. 106 | 107 | ##### ```make_clust()``` 108 | Calculate clustering and produce a visualization object (stored as ```net.viz```). The optional arguments are listed below: 109 | 110 | - ```dist_type='cosine'``` The distance metric used to calculate the distance between all rows and columns (using Scipy). The defalt is cosine distance. 111 | 112 | - ```run_clustering=True``` This determines whether clustering will be calculated. The default is set to ```True```. If ```False``` is given then a visualization of the matrix in its original ordering will be returned. 113 | 114 | - ```dendro=True``` This determines whether a dendrogram will be included in the visualization. The default is True. 115 | 116 | - ```linkage_type='average'``` This determines the linkage type used by Scipy to perform hierarchical clustering. For more options (e.g. 'single', 'complete') and information see [hierarchy.linkage documentation](http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.cluster.hierarchy.linkage.html). 117 | 118 | - ```views=['N_row_sum', 'N_row_var']``` This determines which row-filtered views will be calculated for the clustergram. Filters can be based on sum or variance and the cutoffs can be defined in absolute numbers (```N```) or as a percentage of the number of rows (```pct```). These views are available on the front-end visualization using the sliders. The defalt is ```['N_row_sum', 'N_row_var']```. The four options are: 119 | - ```N_row_sum``` This indicates that additional row-filtered views should be calculated based on the sum of the values in the rows with cutoffs defined by absolute number. For instance, additional views will be calculated showing the top 500, 250, 100, 50, 20, and 10 rows based on the absolute sum of their values. 120 | 121 | - ```pct_row_sum``` This indicates that additional row-filtered views should be calculated based on the sum of the values in the rows with cutoffs defined by the percentage of rows. For instance, additional views will be calculated showing the top 10%, 20%, 30%, ... rows based on the absolute sum of their values. 122 | 123 | - ```N_row_var``` This indicates that additional row-filtered views should be calculated based on the variance of the values in the rows with cutoffs defined by absolute number. For instance, additional views will be calculated showing the top 500, 250, 100, 50, 20, and 10 rows based on the variance of their values. 124 | 125 | - ```pct_row_sum``` This indicates that additional row-filtered views should be calculated based on the variance of the values in the rows with cutoffs defined by the percentage of rows. For instance, additional views will be calculated showing the top 10%, 20%, 30%, ... rows based on the variance of their values. 126 | 127 | - ```sim_mat=False``` This determines whether row and column similarity matrix visualizations will be calculated from your input matrix. The default is ```False```. If it is set to ```True```, then the row and column distance matrices used to calculate hierarchical clustering will be convered to similarity matrices and clustered. These visualization jsons will be stored as ```net.sim['row']``` and ```net.sim['col']```. These can be exporeted for visualization using ```net.write_json_to_file('sim_row', 'sim_row.json')``` and an example of this can be seen in [make_clustergrammer.py](make_clustergrammer.py). 128 | 129 | ##### ```write_json_to_file(net_type, filename, indent='no-indent')``` 130 | This writes a json of the network object data, either ```net.viz``` or ```net.dat```, to a file. Choose ```'viz'``` in order to write a visualization json for clustergrammer.js, e.g. ```net.write_json_to_file('viz','clustergram.json')``` 131 | 132 | ##### ```write_matrix_to_tsv(filename, df=None)``` 133 | This write the matrix, stored in the network object, to a tsv file. Optional row/column categories are saved as tuples. See [tuple_cats.txt](txt/tuple_cats.txt) or [export.txt](txt/export.txt) for examples of the exported matrix file format. 134 | 135 | ##### ```export_net_json(net_type, indent='no-indent')``` 136 | This exports a json string from either ```net.dat``` or ```net.viz```. This is useful if a user wants the json, but does not want to first write to file. 137 | 138 | ##### ```dat_to_df()``` 139 | Export a matrix that has been loaded into the ```Network``` object as a Pandas Data Frame. -------------------------------------------------------------------------------- /RELEASE.md: -------------------------------------------------------------------------------- 1 | Publication Instructions 2 | ------------------------------- 3 | http://peterdowns.com/posts/first-time-with-pypi.html 4 | 5 | Updating Instructions 6 | ---------------------------- 7 | 8 | First, release a new version and push to github repo. 9 | 10 | Then update the setup.py file to reflect the new version. 11 | 12 | adding a tag 13 | ----------------- 14 | git tag -a 0.1 -m "Adds a tag so that we can put this on PyPI." 15 | 16 | run registering and updating 17 | 18 | How to upgrade 19 | ***************** 20 | 1) After commiting changes, make a new release tag and change the setup.py file to reflect this 21 | 22 | 2) Then push with tags 23 | git push github master --tags 24 | 25 | 3) run the test registering and uploading using 26 | python setup.py register -r pypitest 27 | python setup.py sdist upload -r pypitest 28 | 29 | python setup.py register -r pypi 30 | python setup.py sdist upload -r pypi 31 | 32 | 33 | 4) upgrade package 34 | pip install clustergrammer --upgrade 35 | pip show clustergrammer -------------------------------------------------------------------------------- /Row filtering based on original data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [] 11 | } 12 | ], 13 | "metadata": { 14 | "kernelspec": { 15 | "display_name": "Python [Root]", 16 | "language": "python", 17 | "name": "Python [Root]" 18 | }, 19 | "language_info": { 20 | "codemirror_mode": { 21 | "name": "ipython", 22 | "version": 2 23 | }, 24 | "file_extension": ".py", 25 | "mimetype": "text/x-python", 26 | "name": "python", 27 | "nbconvert_exporter": "python", 28 | "pygments_lexer": "ipython2", 29 | "version": "2.7.12" 30 | } 31 | }, 32 | "nbformat": 4, 33 | "nbformat_minor": 2 34 | } 35 | -------------------------------------------------------------------------------- /clustergrammer/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | import pandas as pd 4 | from copy import deepcopy 5 | 6 | from . import initialize_net 7 | from . import load_data 8 | from . import export_data 9 | from . import load_vect_post 10 | from . import make_clust_fun 11 | from . import normalize_fun 12 | from . import data_formats 13 | from . import enrichr_functions as enr_fun 14 | from . import iframe_web_app 15 | from . import run_filter 16 | from . import downsample_fun 17 | from . import categories 18 | 19 | class Network(object): 20 | ''' 21 | version 1.13.6 22 | 23 | Clustergrammer.py takes a matrix as input (either from a file of a Pandas DataFrame), normalizes/filters, hierarchically clusters, and produces the :ref:`visualization_json` for :ref:`clustergrammer_js`. 24 | 25 | Networks have two states: 26 | 27 | 1. the data state, where they are stored as a matrix and nodes 28 | 2. the viz state where they are stored as viz.links, viz.row_nodes, and viz.col_nodes. 29 | 30 | The goal is to start in a data-state and produce a viz-state of 31 | the network that will be used as input to clustergram.js. 32 | ''' 33 | 34 | def __init__(self, widget=None): 35 | initialize_net.main(self, widget) 36 | 37 | def reset(self): 38 | ''' 39 | This re-initializes the Network object. 40 | ''' 41 | initialize_net.main(self) 42 | 43 | def load_file(self, filename): 44 | ''' 45 | Load TSV file. 46 | ''' 47 | load_data.load_file(self, filename) 48 | 49 | def load_file_as_string(self, file_string, filename=''): 50 | ''' 51 | Load file as a string. 52 | ''' 53 | load_data.load_file_as_string(self, file_string, filename=filename) 54 | 55 | 56 | def load_stdin(self): 57 | ''' 58 | Load stdin TSV-formatted string. 59 | ''' 60 | load_data.load_stdin(self) 61 | 62 | def load_tsv_to_net(self, file_buffer, filename=None): 63 | ''' 64 | This will load a TSV matrix file buffer; this is exposed so that it will 65 | be possible to load data without having to read from a file. 66 | ''' 67 | load_data.load_tsv_to_net(self, file_buffer, filename) 68 | 69 | def load_vect_post_to_net(self, vect_post): 70 | ''' 71 | Load data in the vector format JSON. 72 | ''' 73 | load_vect_post.main(self, vect_post) 74 | 75 | def load_data_file_to_net(self, filename): 76 | ''' 77 | Load Clustergrammer's dat format (saved as JSON). 78 | ''' 79 | inst_dat = self.load_json_to_dict(filename) 80 | load_data.load_data_to_net(self, inst_dat) 81 | 82 | def cluster(self, dist_type='cosine', run_clustering=True, 83 | dendro=True, views=['N_row_sum', 'N_row_var'], 84 | linkage_type='average', sim_mat=False, filter_sim=0.1, 85 | calc_cat_pval=False, run_enrichr=None, enrichrgram=None): 86 | ''' 87 | The main function performs hierarchical clustering, optionally generates filtered views (e.g. row-filtered views), and generates the :``visualization_json``. 88 | ''' 89 | initialize_net.viz(self) 90 | 91 | make_clust_fun.make_clust(self, dist_type=dist_type, run_clustering=run_clustering, 92 | dendro=dendro, 93 | requested_views=views, 94 | linkage_type=linkage_type, 95 | sim_mat=sim_mat, 96 | filter_sim=filter_sim, 97 | calc_cat_pval=calc_cat_pval, 98 | run_enrichr=run_enrichr, 99 | enrichrgram=enrichrgram) 100 | 101 | def make_clust(self, dist_type='cosine', run_clustering=True, 102 | dendro=True, views=['N_row_sum', 'N_row_var'], 103 | linkage_type='average', sim_mat=False, filter_sim=0.1, 104 | calc_cat_pval=False, run_enrichr=None, enrichrgram=None): 105 | ''' 106 | ... Will be deprecated, renaming method cluster ... 107 | The main function performs hierarchical clustering, optionally generates filtered views (e.g. row-filtered views), and generates the :``visualization_json``. 108 | ''' 109 | print('make_clust method will be deprecated in next version, please use cluster method.') 110 | initialize_net.viz(self) 111 | 112 | make_clust_fun.make_clust(self, dist_type=dist_type, run_clustering=run_clustering, 113 | dendro=dendro, 114 | requested_views=views, 115 | linkage_type=linkage_type, 116 | sim_mat=sim_mat, 117 | filter_sim=filter_sim, 118 | calc_cat_pval=calc_cat_pval, 119 | run_enrichr=run_enrichr, 120 | enrichrgram=enrichrgram) 121 | 122 | def produce_view(self, requested_view=None): 123 | ''' 124 | This function is under development and will produce a single view on demand. 125 | ''' 126 | print('\tproduce a single view of a matrix, will be used for get requests') 127 | 128 | if requested_view != None: 129 | print('requested_view') 130 | print(requested_view) 131 | 132 | def swap_nan_for_zero(self): 133 | ''' 134 | Swaps all NaN (numpy NaN) instances for zero. 135 | ''' 136 | # # may re-instate this in some form 137 | # self.dat['mat_orig'] = deepcopy(self.dat['mat']) 138 | 139 | self.dat['mat'][np.isnan(self.dat['mat'])] = 0 140 | 141 | def load_df(self, df): 142 | ''' 143 | Load Pandas DataFrame. 144 | ''' 145 | # self.__init__() 146 | self.reset() 147 | 148 | df_dict = {} 149 | df_dict['mat'] = deepcopy(df) 150 | # always define category colors if applicable when loading a df 151 | data_formats.df_to_dat(self, df_dict, define_cat_colors=True) 152 | 153 | def export_df(self): 154 | ''' 155 | Export Pandas DataFrame/ 156 | ''' 157 | df_dict = data_formats.dat_to_df(self) 158 | return df_dict['mat'] 159 | 160 | def df_to_dat(self, df, define_cat_colors=False): 161 | ''' 162 | Load Pandas DataFrame (will be deprecated). 163 | ''' 164 | data_formats.df_to_dat(self, df, define_cat_colors) 165 | 166 | def set_cat_color(self, axis, cat_index, cat_name, inst_color): 167 | 168 | if axis == 0: 169 | axis = 'row' 170 | if axis == 1: 171 | axis = 'col' 172 | 173 | try: 174 | # process cat_index 175 | cat_index = cat_index - 1 176 | cat_index = 'cat-' + str(cat_index) 177 | 178 | self.viz['cat_colors'][axis][cat_index][cat_name] = inst_color 179 | 180 | except: 181 | print('there was an error setting the category color') 182 | 183 | def dat_to_df(self): 184 | ''' 185 | Export Pandas DataFrams (will be deprecated). 186 | ''' 187 | return data_formats.dat_to_df(self) 188 | 189 | def export_net_json(self, net_type='viz', indent='no-indent'): 190 | ''' 191 | Export dat or viz JSON. 192 | ''' 193 | return export_data.export_net_json(self, net_type, indent) 194 | 195 | def export_viz_to_widget(self, which_viz='viz'): 196 | ''' 197 | Export viz JSON, for use with clustergrammer_widget. Formerly method was 198 | named widget. 199 | ''' 200 | 201 | return export_data.export_net_json(self, which_viz, 'no-indent') 202 | 203 | def widget(self, which_viz='viz'): 204 | ''' 205 | Generate a widget visualization using the widget. The export_viz_to_widget 206 | method passes the visualization JSON to the instantiated widget, which is 207 | returned and visualized on the front-end. 208 | ''' 209 | if hasattr(self, 'widget_class') == True: 210 | self.widget_instance = self.widget_class(network = self.export_viz_to_widget(which_viz)) 211 | 212 | return self.widget_instance 213 | else: 214 | print('Can not make widget because Network has no attribute widget_class') 215 | print('Please instantiate Network with clustergrammer_widget using: Network(clustergrammer_widget)') 216 | 217 | 218 | def widget_df(self): 219 | ''' 220 | Export a DataFrame from the front-end visualization. For instance, a user 221 | can filter to show only a single cluster using the dendrogram and then 222 | get a dataframe of this cluster using the widget_df method. 223 | ''' 224 | 225 | if hasattr(self, 'widget_instance') == True: 226 | 227 | if self.widget_instance.mat_string != '': 228 | 229 | tmp_net = deepcopy(Network()) 230 | 231 | df_string = self.widget_instance.mat_string 232 | 233 | tmp_net.load_file_as_string(df_string) 234 | 235 | df = tmp_net.export_df() 236 | 237 | return df 238 | 239 | else: 240 | return self.export_df() 241 | 242 | else: 243 | if hasattr(self, 'widget_class') == True: 244 | print('Please make the widget before exporting the widget DataFrame.') 245 | print('Do this using the widget method: net.widget()') 246 | 247 | else: 248 | print('Can not make widget because Network has no attribute widget_class') 249 | print('Please instantiate Network with clustergrammer_widget using: Network(clustergrammer_widget)') 250 | 251 | def write_json_to_file(self, net_type, filename, indent='no-indent'): 252 | ''' 253 | Save dat or viz as a JSON to file. 254 | ''' 255 | export_data.write_json_to_file(self, net_type, filename, indent) 256 | 257 | def write_matrix_to_tsv(self, filename=None, df=None): 258 | ''' 259 | Export data-matrix to file. 260 | ''' 261 | return export_data.write_matrix_to_tsv(self, filename, df) 262 | 263 | def filter_sum(self, inst_rc, threshold, take_abs=True): 264 | ''' 265 | Filter a network's rows or columns based on the sum across rows or columns. 266 | ''' 267 | inst_df = self.dat_to_df() 268 | if inst_rc == 'row': 269 | inst_df = run_filter.df_filter_row_sum(inst_df, threshold, take_abs) 270 | elif inst_rc == 'col': 271 | inst_df = run_filter.df_filter_col_sum(inst_df, threshold, take_abs) 272 | self.df_to_dat(inst_df) 273 | 274 | def filter_N_top(self, inst_rc, N_top, rank_type='sum'): 275 | ''' 276 | Filter the matrix rows or columns based on sum/variance, and only keep the top 277 | N. 278 | ''' 279 | inst_df = self.dat_to_df() 280 | 281 | inst_df = run_filter.filter_N_top(inst_rc, inst_df, N_top, rank_type) 282 | 283 | self.df_to_dat(inst_df) 284 | 285 | def filter_threshold(self, inst_rc, threshold, num_occur=1): 286 | ''' 287 | Filter the matrix rows or columns based on num_occur values being above a 288 | threshold (in absolute value). 289 | ''' 290 | inst_df = self.dat_to_df() 291 | 292 | inst_df = run_filter.filter_threshold(inst_df, inst_rc, threshold, 293 | num_occur) 294 | 295 | self.df_to_dat(inst_df) 296 | 297 | def filter_cat(self, axis, cat_index, cat_name): 298 | ''' 299 | Filter the matrix based on their category. cat_index is the index of the category, the first category has index=1. 300 | ''' 301 | run_filter.filter_cat(self, axis, cat_index, cat_name) 302 | 303 | def filter_names(self, axis, names): 304 | ''' 305 | Filter the visualization using row/column names. The function takes, axis ('row'/'col') and names, a list of strings. 306 | ''' 307 | run_filter.filter_names(self, axis, names) 308 | 309 | def clip(self, lower=None, upper=None): 310 | ''' 311 | Trim values at input thresholds using pandas function 312 | ''' 313 | df = self.export_df() 314 | df = df.clip(lower=lower, upper=upper) 315 | self.load_df(df) 316 | 317 | def normalize(self, df=None, norm_type='zscore', axis='row', keep_orig=False): 318 | ''' 319 | Normalize the matrix rows or columns using Z-score (zscore) or Quantile Normalization (qn). Users can optionally pass in a DataFrame to be normalized (and this will be incorporated into the Network object). 320 | ''' 321 | normalize_fun.run_norm(self, df, norm_type, axis, keep_orig) 322 | 323 | def downsample(self, df=None, ds_type='kmeans', axis='row', num_samples=100, random_state=1000): 324 | ''' 325 | Downsample the matrix rows or columns (currently supporting kmeans only). Users can optionally pass in a DataFrame to be downsampled (and this will be incorporated into the network object). 326 | ''' 327 | 328 | return downsample_fun.main(self, df, ds_type, axis, num_samples, random_state) 329 | 330 | def random_sample(self, num_samples, df=None, replace=False, weights=None, random_state=100, axis='row'): 331 | ''' 332 | Return random sample of matrix. 333 | ''' 334 | 335 | if df is None: 336 | df = self.dat_to_df() 337 | 338 | if axis == 'row': 339 | axis = 0 340 | if axis == 'col': 341 | axis = 1 342 | 343 | df = self.export_df() 344 | df = df.sample(n=num_samples, replace=replace, weights=weights, random_state=random_state, axis=axis) 345 | 346 | self.load_df(df) 347 | 348 | def add_cats(self, axis, cat_data): 349 | ''' 350 | Add categories to rows or columns using cat_data array of objects. Each object in cat_data is a dictionary with one key (category title) and value (rows/column names) that have this category. Categories will be added onto the existing categories and will be added in the order of the objects in the array. 351 | 352 | Example ``cat_data``:: 353 | 354 | 355 | [ 356 | { 357 | "title": "First Category", 358 | "cats": { 359 | "true": [ 360 | "ROS1", 361 | "AAK1" 362 | ] 363 | } 364 | }, 365 | { 366 | "title": "Second Category", 367 | "cats": { 368 | "something": [ 369 | "PDK4" 370 | ] 371 | } 372 | } 373 | ] 374 | 375 | 376 | ''' 377 | for inst_data in cat_data: 378 | categories.add_cats(self, axis, inst_data) 379 | 380 | def dendro_cats(self, axis, dendro_level): 381 | ''' 382 | Generate categories from dendrogram groups/clusters. The dendrogram has 11 383 | levels to choose from 0 -> 10. Dendro_level can be given as an integer or 384 | string. 385 | ''' 386 | categories.dendro_cats(self, axis, dendro_level) 387 | 388 | def Iframe_web_app(self, filename=None, width=1000, height=800): 389 | 390 | link = iframe_web_app.main(self, filename, width, height) 391 | 392 | return link 393 | 394 | def enrichrgram(self, lib, axis='row'): 395 | ''' 396 | Add Enrichr gene enrichment results to your visualization (where your rows 397 | are genes). Run enrichrgram before clustering to incldue enrichment results 398 | as row categories. Enrichrgram can also be run on the front-end using the 399 | Enrichr logo at the top left. 400 | 401 | Set lib to the Enrichr library that you want to use for enrichment analysis. 402 | Libraries included: 403 | 404 | * ChEA_2016 405 | * KEA_2015 406 | * ENCODE_TF_ChIP-seq_2015 407 | * ENCODE_Histone_Modifications_2015 408 | * Disease_Perturbations_from_GEO_up 409 | * Disease_Perturbations_from_GEO_down 410 | * GO_Molecular_Function_2015 411 | * GO_Biological_Process_2015 412 | * GO_Cellular_Component_2015 413 | * Reactome_2016 414 | * KEGG_2016 415 | * MGI_Mammalian_Phenotype_Level_4 416 | * LINCS_L1000_Chem_Pert_up 417 | * LINCS_L1000_Chem_Pert_down 418 | 419 | ''' 420 | 421 | df = self.export_df() 422 | df, bar_info = enr_fun.add_enrichr_cats(df, axis, lib) 423 | self.load_df(df) 424 | 425 | self.dat['enrichrgram_lib'] = lib 426 | self.dat['row_cat_bars'] = bar_info 427 | 428 | @staticmethod 429 | def load_gmt(filename): 430 | return load_data.load_gmt(filename) 431 | 432 | @staticmethod 433 | def load_json_to_dict(filename): 434 | return load_data.load_json_to_dict(filename) 435 | 436 | @staticmethod 437 | def save_dict_to_json(inst_dict, filename, indent='no-indent'): 438 | export_data.save_dict_to_json(inst_dict, filename, indent) -------------------------------------------------------------------------------- /clustergrammer/calc_clust.py: -------------------------------------------------------------------------------- 1 | def cluster_row_and_col(net, dist_type='cosine', linkage_type='average', 2 | dendro=True, run_clustering=True, run_rank=True, 3 | ignore_cat=False, calc_cat_pval=False, links=False): 4 | ''' cluster net.dat and make visualization json, net.viz. 5 | optionally leave out dendrogram colorbar groups with dendro argument ''' 6 | 7 | import scipy 8 | from copy import deepcopy 9 | from scipy.spatial.distance import pdist 10 | from . import categories, make_viz, cat_pval 11 | 12 | dm = {} 13 | for inst_rc in ['row', 'col']: 14 | 15 | tmp_mat = deepcopy(net.dat['mat']) 16 | dm[inst_rc] = calc_distance_matrix(tmp_mat, inst_rc, dist_type) 17 | 18 | # save directly to dat structure 19 | node_info = net.dat['node_info'][inst_rc] 20 | 21 | node_info['ini'] = list(range( len(net.dat['nodes'][inst_rc]), -1, -1)) 22 | 23 | # cluster 24 | if run_clustering is True: 25 | node_info['clust'], node_info['group'] = \ 26 | clust_and_group(net, dm[inst_rc], linkage_type=linkage_type) 27 | else: 28 | dendro = False 29 | node_info['clust'] = node_info['ini'] 30 | 31 | # sorting 32 | if run_rank is True: 33 | node_info['rank'] = sort_rank_nodes(net, inst_rc, 'sum') 34 | node_info['rankvar'] = sort_rank_nodes(net, inst_rc, 'var') 35 | else: 36 | node_info['rank'] = node_info['ini'] 37 | node_info['rankvar'] = node_info['ini'] 38 | 39 | ################################## 40 | if ignore_cat is False: 41 | categories.calc_cat_clust_order(net, inst_rc) 42 | 43 | if calc_cat_pval is True: 44 | cat_pval.main(net) 45 | 46 | # make the visualization json 47 | make_viz.viz_json(net, dendro, links) 48 | 49 | return dm 50 | 51 | def calc_distance_matrix(tmp_mat, inst_rc, dist_type='cosine'): 52 | from scipy.spatial.distance import pdist 53 | import numpy as np 54 | 55 | if inst_rc == 'row': 56 | inst_dm = pdist(tmp_mat, metric=dist_type) 57 | elif inst_rc == 'col': 58 | inst_dm = pdist(tmp_mat.transpose(), metric=dist_type) 59 | 60 | inst_dm[inst_dm < 0] = float(0) 61 | 62 | return inst_dm 63 | 64 | def clust_and_group(net, inst_dm, linkage_type='average'): 65 | import scipy.cluster.hierarchy as hier 66 | Y = hier.linkage(inst_dm, method=linkage_type) 67 | Z = hier.dendrogram(Y, no_plot=True) 68 | inst_clust_order = Z['leaves'] 69 | all_dist = group_cutoffs() 70 | 71 | groups = {} 72 | for inst_dist in all_dist: 73 | inst_key = str(inst_dist).replace('.', '') 74 | groups[inst_key] = hier.fcluster(Y, inst_dist * inst_dm.max(), 'distance') 75 | groups[inst_key] = groups[inst_key].tolist() 76 | 77 | return inst_clust_order, groups 78 | 79 | def sort_rank_nodes(net, rowcol, rank_type): 80 | import numpy as np 81 | from operator import itemgetter 82 | from copy import deepcopy 83 | 84 | tmp_nodes = deepcopy(net.dat['nodes'][rowcol]) 85 | inst_mat = deepcopy(net.dat['mat']) 86 | 87 | sum_term = [] 88 | for i in range(len(tmp_nodes)): 89 | inst_dict = {} 90 | inst_dict['name'] = tmp_nodes[i] 91 | 92 | if rowcol == 'row': 93 | if rank_type == 'sum': 94 | inst_dict['rank'] = np.sum(inst_mat[i, :]) 95 | elif rank_type == 'var': 96 | inst_dict['rank'] = np.var(inst_mat[i, :]) 97 | else: 98 | if rank_type == 'sum': 99 | inst_dict['rank'] = np.sum(inst_mat[:, i]) 100 | elif rank_type == 'var': 101 | inst_dict['rank'] = np.var(inst_mat[:, i]) 102 | 103 | sum_term.append(inst_dict) 104 | 105 | sum_term = sorted(sum_term, key=itemgetter('rank'), reverse=False) 106 | 107 | tmp_sort_nodes = [] 108 | for inst_dict in sum_term: 109 | tmp_sort_nodes.append(inst_dict['name']) 110 | 111 | sort_index = [] 112 | for inst_node in tmp_nodes: 113 | sort_index.append(tmp_sort_nodes.index(inst_node)) 114 | 115 | return sort_index 116 | 117 | def group_cutoffs(): 118 | all_dist = [] 119 | for i in range(11): 120 | all_dist.append(float(i) / 10) 121 | return all_dist 122 | -------------------------------------------------------------------------------- /clustergrammer/cat_pval.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | from copy import deepcopy 4 | 5 | def main(net): 6 | ''' 7 | calculate pvalue of category closeness 8 | ''' 9 | # calculate the distance between the data points within the same category and 10 | # compare to null distribution 11 | for inst_rc in ['row', 'col']: 12 | 13 | inst_nodes = deepcopy(net.dat['nodes'][inst_rc]) 14 | 15 | inst_index = deepcopy(net.dat['node_info'][inst_rc]['clust']) 16 | 17 | # reorder based on clustered order 18 | inst_nodes = [ inst_nodes[i] for i in inst_index] 19 | 20 | # make distance matrix dataframe 21 | dm = dist_matrix_lattice(inst_nodes) 22 | 23 | node_infos = list(net.dat['node_info'][inst_rc].keys()) 24 | 25 | all_cats = [] 26 | for inst_info in node_infos: 27 | if 'dict_cat_' in inst_info: 28 | all_cats.append(inst_info) 29 | 30 | for cat_dict in all_cats: 31 | 32 | tmp_dict = net.dat['node_info'][inst_rc][cat_dict] 33 | 34 | pval_name = cat_dict.replace('dict_','pval_') 35 | net.dat['node_info'][inst_rc][pval_name] = {} 36 | 37 | for cat_name in tmp_dict: 38 | 39 | subset = tmp_dict[cat_name] 40 | 41 | inst_median = calc_median_dist_subset(dm, subset) 42 | 43 | hist = calc_hist_distances(dm, subset, inst_nodes) 44 | 45 | pval = 0 46 | 47 | for i in range(len(hist['prob'])): 48 | if i == 0: 49 | pval = hist['prob'][i] 50 | if i >= 1: 51 | if inst_median >= hist['bins'][i]: 52 | pval = pval + hist['prob'][i] 53 | 54 | net.dat['node_info'][inst_rc][pval_name][cat_name] = pval 55 | 56 | def dist_matrix_lattice(names): 57 | from scipy.spatial.distance import pdist, squareform 58 | 59 | lattice_size = len(names) 60 | mat = np.zeros([lattice_size, 1]) 61 | mat[:,0] = list(range(lattice_size)) 62 | 63 | inst_dm = pdist(mat, metric='euclidean') 64 | 65 | inst_dm[inst_dm < 0] = float(0) 66 | 67 | inst_dm = squareform(inst_dm) 68 | 69 | df = pd.DataFrame(data=inst_dm, columns=names, index=names) 70 | 71 | return df 72 | 73 | 74 | def calc_median_dist_subset(dm, subset): 75 | return np.median(dm[subset].ix[subset].values) 76 | 77 | def calc_hist_distances(dm, subset, inst_nodes): 78 | np.random.seed(100) 79 | 80 | num_null = 1000 81 | num_points = len(subset) 82 | 83 | median_dist = [] 84 | for i in range(num_null): 85 | tmp = np.random.choice(inst_nodes, num_points, replace=False) 86 | median_dist.append( np.median(dm[tmp].ix[tmp].values) ) 87 | 88 | tmp_dist = sorted(deepcopy(median_dist)) 89 | 90 | median_dist = np.asarray(median_dist) 91 | s1 = pd.Series(median_dist) 92 | hist = np.histogram(s1, bins=30) 93 | 94 | H = {} 95 | H['prob'] = hist[0]/np.float(num_null) 96 | H['bins'] = hist[1] 97 | 98 | return H -------------------------------------------------------------------------------- /clustergrammer/categories.py: -------------------------------------------------------------------------------- 1 | def check_categories(lines): 2 | ''' 3 | find out how many row and col categories are available 4 | ''' 5 | # count the number of row categories 6 | rcat_line = lines[0].split('\t') 7 | 8 | # calc the number of row names and categories 9 | num_rc = 0 10 | found_end = False 11 | 12 | # skip first tab 13 | for inst_string in rcat_line[1:]: 14 | if inst_string == '': 15 | if found_end is False: 16 | num_rc = num_rc + 1 17 | else: 18 | found_end = True 19 | 20 | max_rcat = 15 21 | if max_rcat > len(lines): 22 | max_rcat = len(lines) - 1 23 | 24 | num_cc = 0 25 | for i in range(max_rcat): 26 | ccat_line = lines[i + 1].split('\t') 27 | 28 | # make sure that line has length greater than one to prevent false cats from 29 | # trailing new lines at end of matrix 30 | if ccat_line[0] == '' and len(ccat_line) > 1: 31 | num_cc = num_cc + 1 32 | 33 | num_labels = {} 34 | num_labels['row'] = num_rc + 1 35 | num_labels['col'] = num_cc + 1 36 | 37 | return num_labels 38 | 39 | def dict_cat(net, define_cat_colors=False): 40 | ''' 41 | make a dictionary of node-category associations 42 | ''' 43 | 44 | # print('---------------------------------') 45 | # print('---- dict_cat: before setting cat colors') 46 | # print('---------------------------------\n') 47 | # print(define_cat_colors) 48 | # print(net.viz['cat_colors']) 49 | 50 | net.persistent_cat = True 51 | 52 | for inst_rc in ['row', 'col']: 53 | inst_keys = list(net.dat['node_info'][inst_rc].keys()) 54 | all_cats = [x for x in inst_keys if 'cat-' in x] 55 | 56 | for inst_name_cat in all_cats: 57 | 58 | dict_cat = {} 59 | tmp_cats = net.dat['node_info'][inst_rc][inst_name_cat] 60 | tmp_nodes = net.dat['nodes'][inst_rc] 61 | 62 | for i in range(len(tmp_cats)): 63 | inst_cat = tmp_cats[i] 64 | inst_node = tmp_nodes[i] 65 | 66 | if inst_cat not in dict_cat: 67 | dict_cat[inst_cat] = [] 68 | 69 | dict_cat[inst_cat].append(inst_node) 70 | 71 | tmp_name = 'dict_' + inst_name_cat.replace('-', '_') 72 | net.dat['node_info'][inst_rc][tmp_name] = dict_cat 73 | 74 | # merge with old cat_colors by default 75 | cat_colors = net.viz['cat_colors'] 76 | 77 | if define_cat_colors == True: 78 | cat_number = 0 79 | 80 | for inst_rc in ['row', 'col']: 81 | 82 | inst_keys = list(net.dat['node_info'][inst_rc].keys()) 83 | all_cats = [x for x in inst_keys if 'cat-' in x] 84 | 85 | for cat_index in all_cats: 86 | 87 | if cat_index not in cat_colors[inst_rc]: 88 | cat_colors[inst_rc][cat_index] = {} 89 | 90 | cat_names = sorted(list(set(net.dat['node_info'][inst_rc][cat_index]))) 91 | 92 | # loop through each category name and assign a color 93 | for tmp_name in cat_names: 94 | 95 | # using the same rules as the front-end to define cat_colors 96 | inst_color = get_cat_color(cat_number + cat_names.index(tmp_name)) 97 | 98 | check_name = tmp_name 99 | 100 | # check if category is string type and non-numeric 101 | try: 102 | float(check_name) 103 | is_string_cat = False 104 | except: 105 | is_string_cat = True 106 | 107 | if is_string_cat == True: 108 | # check for default non-color 109 | if ': ' in check_name: 110 | check_name = check_name.split(': ')[1] 111 | 112 | # if check_name == 'False' or check_name == 'false': 113 | if 'False' in check_name or 'false' in check_name: 114 | inst_color = '#eee' 115 | 116 | if 'Not ' in check_name: 117 | inst_color = '#eee' 118 | 119 | # print('cat_colors') 120 | # print('----------') 121 | # print(cat_colors[inst_rc][cat_index]) 122 | 123 | # do not overwrite old colors 124 | if tmp_name not in cat_colors[inst_rc][cat_index] and is_string_cat: 125 | 126 | cat_colors[inst_rc][cat_index][tmp_name] = inst_color 127 | # print('overwrite: ' + tmp_name + ' -> ' + str(inst_color)) 128 | 129 | cat_number = cat_number + 1 130 | 131 | net.viz['cat_colors'] = cat_colors 132 | 133 | # print('after setting cat_colors') 134 | # print(net.viz['cat_colors']) 135 | # print('======================================\n\n') 136 | 137 | def calc_cat_clust_order(net, inst_rc): 138 | ''' 139 | cluster category subset of data 140 | ''' 141 | from .__init__ import Network 142 | from copy import deepcopy 143 | from . import calc_clust, run_filter 144 | 145 | inst_keys = list(net.dat['node_info'][inst_rc].keys()) 146 | all_cats = [x for x in inst_keys if 'cat-' in x] 147 | 148 | if len(all_cats) > 0: 149 | 150 | for inst_name_cat in all_cats: 151 | 152 | tmp_name = 'dict_' + inst_name_cat.replace('-', '_') 153 | dict_cat = net.dat['node_info'][inst_rc][tmp_name] 154 | 155 | unordered_cats = dict_cat.keys() 156 | 157 | ordered_cats = order_categories(unordered_cats) 158 | 159 | # this is the ordering of the columns based on their category, not 160 | # including their clustering ordering within category 161 | all_cat_orders = [] 162 | tmp_names_list = [] 163 | for inst_cat in ordered_cats: 164 | 165 | inst_nodes = dict_cat[inst_cat] 166 | 167 | tmp_names_list.extend(inst_nodes) 168 | 169 | # cat_net = deepcopy(Network()) 170 | 171 | # cat_net.dat['mat'] = deepcopy(net.dat['mat']) 172 | # cat_net.dat['nodes'] = deepcopy(net.dat['nodes']) 173 | 174 | # cat_df = cat_net.dat_to_df() 175 | 176 | # sub_df = {} 177 | # if inst_rc == 'col': 178 | # sub_df['mat'] = cat_df['mat'][inst_nodes] 179 | # elif inst_rc == 'row': 180 | # # need to transpose df 181 | # cat_df['mat'] = cat_df['mat'].transpose() 182 | # sub_df['mat'] = cat_df['mat'][inst_nodes] 183 | # sub_df['mat'] = sub_df['mat'].transpose() 184 | 185 | # # filter matrix before clustering 186 | # ################################### 187 | # threshold = 0.0001 188 | # sub_df = run_filter.df_filter_row_sum(sub_df, threshold) 189 | # sub_df = run_filter.df_filter_col_sum(sub_df, threshold) 190 | 191 | # # load back to dat 192 | # cat_net.df_to_dat(sub_df) 193 | 194 | # cat_mat_shape = cat_net.dat['mat'].shape 195 | 196 | # print('***************') 197 | # try: 198 | # if cat_mat_shape[0]>1 and cat_mat_shape[1] > 1 and all_are_numbers == False: 199 | 200 | # calc_clust.cluster_row_and_col(cat_net, 'cos') 201 | # inst_cat_order = cat_net.dat['node_info'][inst_rc]['clust'] 202 | # else: 203 | # inst_cat_order = list(range(len(cat_net.dat['nodes'][inst_rc]))) 204 | 205 | # except: 206 | # inst_cat_order = list(range(len(cat_net.dat['nodes'][inst_rc]))) 207 | 208 | 209 | # prev_order_len = len(all_cat_orders) 210 | 211 | # # add prev order length to the current order number 212 | # inst_cat_order = [i + prev_order_len for i in inst_cat_order] 213 | # all_cat_orders.extend(inst_cat_order) 214 | 215 | # # generate ordered list of row/col names, which will be used to 216 | # # assign the order to specific nodes 217 | # names_clust_list = [x for (y, x) in sorted(zip(all_cat_orders, 218 | # tmp_names_list))] 219 | 220 | names_clust_list = tmp_names_list 221 | 222 | # calc category-cluster order 223 | final_order = [] 224 | 225 | for i in range(len(net.dat['nodes'][inst_rc])): 226 | 227 | inst_node_name = net.dat['nodes'][inst_rc][i] 228 | inst_node_num = names_clust_list.index(inst_node_name) 229 | 230 | final_order.append(inst_node_num) 231 | 232 | inst_index_cat = inst_name_cat.replace('-', '_') + '_index' 233 | 234 | net.dat['node_info'][inst_rc][inst_index_cat] = final_order 235 | 236 | 237 | def order_categories(unordered_cats): 238 | ''' 239 | If categories are strings, then simple ordering is fine. 240 | If categories are values then I'll need to order based on their values. 241 | The final ordering is given as the original categories (including titles) in a 242 | ordered list. 243 | ''' 244 | 245 | no_titles = remove_titles(unordered_cats) 246 | 247 | all_are_numbers = check_all_numbers(no_titles) 248 | 249 | if all_are_numbers: 250 | ordered_cats = order_cats_based_on_values(unordered_cats, no_titles) 251 | else: 252 | ordered_cats = sorted(unordered_cats) 253 | 254 | return ordered_cats 255 | 256 | 257 | def order_cats_based_on_values(unordered_cats, values_list): 258 | import pandas as pd 259 | 260 | try: 261 | # convert values_list to values 262 | values_list = [float(i) for i in values_list] 263 | 264 | inst_series = pd.Series(data=values_list, index=unordered_cats) 265 | 266 | inst_series.sort_values(inplace=True) 267 | 268 | ordered_cats = inst_series.index.tolist() 269 | 270 | # ordered_cats = unordered_cats 271 | except: 272 | # keep default ordering if error occurs 273 | print('error sorting cats based on values ') 274 | ordered_cats = unordered_cats 275 | 276 | return ordered_cats 277 | 278 | def check_all_numbers(no_titles): 279 | all_numbers = True 280 | for tmp in no_titles: 281 | if is_number(tmp) == False: 282 | all_numbers = False 283 | 284 | return all_numbers 285 | 286 | def remove_titles(cats): 287 | from copy import deepcopy 288 | 289 | # check if all have titles 290 | ########################### 291 | all_have_titles = True 292 | 293 | for inst_cat in cats: 294 | if is_number(inst_cat) == False: 295 | if ': ' not in inst_cat: 296 | all_have_titles = False 297 | else: 298 | all_have_titles = False 299 | 300 | if all_have_titles: 301 | no_titles = cats 302 | no_titles = [i.split(': ')[1] for i in no_titles] 303 | 304 | else: 305 | no_titles = cats 306 | 307 | return no_titles 308 | 309 | def is_number(s): 310 | try: 311 | float(s) 312 | return True 313 | except ValueError: 314 | return False 315 | 316 | def get_cat_color(cat_num): 317 | 318 | all_colors = [ "#393b79", "#aec7e8", "#ff7f0e", "#ffbb78", "#98df8a", "#bcbd22", 319 | "#404040", "#ff9896", "#c5b0d5", "#8c564b", "#1f77b4", "#5254a3", "#FFDB58", 320 | "#c49c94", "#e377c2", "#7f7f7f", "#2ca02c", "#9467bd", "#dbdb8d", "#17becf", 321 | "#637939", "#6b6ecf", "#9c9ede", "#d62728", "#8ca252", "#8c6d31", "#bd9e39", 322 | "#e7cb94", "#843c39", "#ad494a", "#d6616b", "#7b4173", "#a55194", "#ce6dbd", 323 | "#de9ed6"]; 324 | 325 | inst_color = all_colors[cat_num % len(all_colors)] 326 | 327 | return inst_color 328 | 329 | def dendro_cats(net, axis, dendro_level): 330 | 331 | if axis == 0: 332 | axis = 'row' 333 | if axis == 1: 334 | axis = 'col' 335 | 336 | dendro_level = str(dendro_level) 337 | dendro_level_name = dendro_level 338 | if len(dendro_level) == 1: 339 | dendro_level = '0' + dendro_level 340 | 341 | df = net.export_df() 342 | 343 | if axis == 'row': 344 | old_names = df.index.tolist() 345 | elif axis == 'col': 346 | old_names = df.columns.tolist() 347 | 348 | if 'group' in net.dat['node_info'][axis]: 349 | inst_groups = net.dat['node_info'][axis]['group'][dendro_level] 350 | 351 | new_names = [] 352 | for i in range(len(old_names)): 353 | inst_name = old_names[i] 354 | group_cat = 'Group '+ str(dendro_level_name) +': cat-' + str(inst_groups[i]) 355 | inst_name = inst_name + (group_cat,) 356 | new_names.append(inst_name) 357 | 358 | if axis == 'row': 359 | df.index = new_names 360 | elif axis == 'col': 361 | df.columns = new_names 362 | 363 | net.load_df(df) 364 | 365 | else: 366 | print('please cluster, using make_clust, to define dendrogram groups before running dendro_cats') 367 | 368 | def add_cats(net, axis, cat_data): 369 | 370 | try: 371 | df = net.export_df() 372 | 373 | if axis == 'row': 374 | labels = df.index.tolist() 375 | elif axis == 'col': 376 | labels = df.columns.tolist() 377 | 378 | if 'title' in cat_data: 379 | inst_title = cat_data['title'] 380 | else: 381 | inst_title = 'New Category' 382 | 383 | all_cats = cat_data['cats'] 384 | 385 | # loop through all labels 386 | new_labels = [] 387 | for inst_label in labels: 388 | 389 | if type(inst_label) is tuple: 390 | check_name = inst_label[0] 391 | found_tuple = True 392 | else: 393 | check_name = inst_label 394 | found_tuple = False 395 | 396 | if ': ' in check_name: 397 | check_name = check_name.split(': ')[1] 398 | 399 | # default to False for found cat, overwrite if necessary 400 | found_cat = inst_title + ': False' 401 | 402 | # check all categories in cats 403 | for inst_cat in all_cats: 404 | 405 | inst_names = all_cats[inst_cat] 406 | 407 | if check_name in inst_names: 408 | found_cat = inst_title + ': ' + inst_cat 409 | 410 | # add category to label 411 | if found_tuple is True: 412 | new_label = inst_label + (found_cat,) 413 | else: 414 | new_label = (inst_label, found_cat) 415 | 416 | new_labels.append(new_label) 417 | 418 | 419 | # add labels back to DataFrame 420 | if axis == 'row': 421 | df.index = new_labels 422 | elif axis == 'col': 423 | df.columns = new_labels 424 | 425 | net.load_df(df) 426 | 427 | except: 428 | print('error adding new categories') 429 | 430 | 431 | 432 | 433 | -------------------------------------------------------------------------------- /clustergrammer/data_formats.py: -------------------------------------------------------------------------------- 1 | from . import make_unique_labels 2 | 3 | def df_to_dat(net, df, define_cat_colors=False): 4 | ''' 5 | This is always run when data is loaded. 6 | ''' 7 | from . import categories 8 | 9 | # check if df has unique values 10 | df['mat'] = make_unique_labels.main(net, df['mat']) 11 | 12 | net.dat['mat'] = df['mat'].values 13 | net.dat['nodes']['row'] = df['mat'].index.tolist() 14 | net.dat['nodes']['col'] = df['mat'].columns.tolist() 15 | 16 | for inst_rc in ['row', 'col']: 17 | 18 | if type(net.dat['nodes'][inst_rc][0]) is tuple: 19 | # get the number of categories from the length of the tuple 20 | # subtract 1 because the name is the first element of the tuple 21 | num_cat = len(net.dat['nodes'][inst_rc][0]) - 1 22 | 23 | net.dat['node_info'][inst_rc]['full_names'] = net.dat['nodes']\ 24 | [inst_rc] 25 | 26 | for inst_rcat in range(num_cat): 27 | net.dat['node_info'][inst_rc]['cat-' + str(inst_rcat)] = \ 28 | [i[inst_rcat + 1] for i in net.dat['nodes'][inst_rc]] 29 | 30 | net.dat['nodes'][inst_rc] = [i[0] for i in net.dat['nodes'][inst_rc]] 31 | 32 | if 'mat_up' in df: 33 | net.dat['mat_up'] = df['mat_up'].values 34 | net.dat['mat_dn'] = df['mat_dn'].values 35 | 36 | if 'mat_orig' in df: 37 | net.dat['mat_orig'] = df['mat_orig'].values 38 | 39 | categories.dict_cat(net, define_cat_colors=define_cat_colors) 40 | 41 | def dat_to_df(net): 42 | import pandas as pd 43 | 44 | df = {} 45 | nodes = {} 46 | for inst_rc in ['row', 'col']: 47 | if 'full_names' in net.dat['node_info'][inst_rc]: 48 | nodes[inst_rc] = net.dat['node_info'][inst_rc]['full_names'] 49 | else: 50 | nodes[inst_rc] = net.dat['nodes'][inst_rc] 51 | 52 | df['mat'] = pd.DataFrame(data=net.dat['mat'], columns=nodes['col'], 53 | index=nodes['row']) 54 | 55 | if 'mat_up' in net.dat: 56 | 57 | df['mat_up'] = pd.DataFrame(data=net.dat['mat_up'], 58 | columns=nodes['col'], index=nodes['row']) 59 | 60 | df['mat_dn'] = pd.DataFrame(data=net.dat['mat_dn'], 61 | columns=nodes['col'], index=nodes['row']) 62 | 63 | if 'mat_orig' in net.dat: 64 | df['mat_orig'] = pd.DataFrame(data=net.dat['mat_orig'], 65 | columns=nodes['col'], index=nodes['row']) 66 | 67 | return df 68 | 69 | def mat_to_numpy_arr(self): 70 | ''' convert list to numpy array - numpy arrays can not be saved as json ''' 71 | import numpy as np 72 | self.dat['mat'] = np.asarray(self.dat['mat']) -------------------------------------------------------------------------------- /clustergrammer/downsample_fun.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.cluster import MiniBatchKMeans 4 | # string used to format titles 5 | super_string = ': ' 6 | 7 | def main(net, df=None, ds_type='kmeans', axis='row', num_samples=100, random_state=1000): 8 | 9 | if df is None: 10 | df = net.export_df() 11 | 12 | # # run downsampling 13 | # random_state = 1000 14 | 15 | ds_df, ds_data = run_kmeans_mini_batch(df, num_samples, axis, random_state) 16 | 17 | net.load_df(ds_df) 18 | 19 | return ds_data 20 | 21 | def run_kmeans_mini_batch(df, num_samples=100, axis='row', random_state=1000): 22 | 23 | 24 | # gather downsampled axis information 25 | if axis == 'row': 26 | X = df 27 | orig_labels = df.index.tolist() 28 | non_ds_labels = df.columns.tolist() 29 | 30 | else: 31 | X = df.transpose() 32 | orig_labels = df.columns.tolist() 33 | non_ds_labels = df.index.tolist() 34 | 35 | cat_index = 1 36 | 37 | # run until the number of returned clusters with data-points is equal to the 38 | # number of requested clusters 39 | num_returned_clusters = 0 40 | while num_samples != num_returned_clusters: 41 | 42 | clusters, num_returned_clusters, cluster_data, cluster_pop = \ 43 | calc_mbk_clusters(X, num_samples, random_state) 44 | 45 | random_state = random_state + random_state 46 | 47 | clust_numbers = range(num_returned_clusters) 48 | clust_labels = [ 'cluster-' + str(i) for i in clust_numbers] 49 | 50 | if type(orig_labels[0]) is tuple: 51 | found_cats = True 52 | else: 53 | found_cats = False 54 | 55 | # Gather categories if necessary 56 | ######################################## 57 | # check if there are categories 58 | if found_cats: 59 | all_cats = generate_cat_data(cluster_data, orig_labels, num_samples) 60 | 61 | # genrate cluster labels, e.g. add number in each cluster and majority cat 62 | # if necessary 63 | cluster_labels = [] 64 | for i in range(num_returned_clusters): 65 | 66 | inst_name = 'Cluster: ' + clust_labels[i] 67 | num_in_clust_string = 'number in clust: '+ str(cluster_pop[i]) 68 | 69 | inst_tuple = (inst_name,) 70 | 71 | if found_cats: 72 | for cat_data in all_cats: 73 | cat_values = cat_data['counts'][i] 74 | max_cat_fraction = cat_values.max() 75 | max_index = np.where(cat_values == max_cat_fraction)[0][0] 76 | max_cat_name = cat_data['types'][max_index] 77 | 78 | # add category title if available 79 | cat_name_string = 'Majority-'+ cat_data['title'] +': ' + max_cat_name 80 | 81 | inst_tuple = inst_tuple + (cat_name_string,) 82 | 83 | inst_tuple = inst_tuple + (num_in_clust_string,) 84 | 85 | cluster_labels.append(inst_tuple) 86 | 87 | # ds_df is always downsampling the rows, if the user wants to downsample the 88 | # columns, the df will be switched back later 89 | ds_df = pd.DataFrame(data=clusters, index=cluster_labels, columns=non_ds_labels) 90 | 91 | # swap back for downsampled columns 92 | if axis == 'col': 93 | ds_df = ds_df.transpose() 94 | 95 | return ds_df, cluster_data 96 | 97 | def generate_cat_data(cluster_data, orig_labels, num_samples): 98 | 99 | # generate an array of orig_labels, using an array so that I can gather 100 | # label subsets using indices 101 | orig_array = np.asarray(orig_labels) 102 | 103 | example_label = orig_labels[0] 104 | 105 | # find out how many string categories are available 106 | num_cats = 0 107 | for i in range(len(example_label)): 108 | 109 | if i > 0: 110 | inst_cat = example_label[i] 111 | if super_string in inst_cat: 112 | inst_cat = inst_cat.split(super_string)[1] 113 | 114 | string_cat = True 115 | try: 116 | float(inst_cat) 117 | string_cat = False 118 | except: 119 | string_cat = True 120 | 121 | if string_cat: 122 | num_cats = num_cats + 1 123 | 124 | all_cats = [] 125 | 126 | for cat_index in range(num_cats): 127 | 128 | # index zero is for the names 129 | cat_index = cat_index + 1 130 | 131 | cat_data = {} 132 | 133 | if super_string in example_label[cat_index]: 134 | cat_data['title'] = example_label[cat_index].split(super_string)[0] 135 | else: 136 | cat_data['title'] = 'Category' 137 | 138 | # if there are string categories, then keep track of how many of each category 139 | # are found in each of the downsampled clusters. 140 | cat_data['types'] = [] 141 | 142 | # gather possible categories 143 | for inst_label in orig_labels: 144 | 145 | inst_cat = inst_label[cat_index] 146 | 147 | if super_string in inst_cat: 148 | inst_cat = inst_cat.split(super_string)[1] 149 | 150 | # get first category 151 | cat_data['types'].append(inst_cat) 152 | 153 | cat_data['types'] = sorted(list(set(cat_data['types']))) 154 | 155 | num_cats = len(cat_data['types']) 156 | 157 | # initialize cat_data['counts'] dictionary 158 | cat_data['counts'] = {} 159 | for inst_clust in range(num_samples): 160 | cat_data['counts'][inst_clust] = np.zeros([num_cats]) 161 | 162 | # populate cat_data['counts'] 163 | for inst_clust in range(num_samples): 164 | 165 | # get the indicies of all original labels that fall in the cluster 166 | found = np.where(cluster_data == inst_clust) 167 | found_indicies = found[0] 168 | 169 | clust_names = orig_array[found_indicies] 170 | 171 | for inst_name in clust_names: 172 | 173 | # get first category name 174 | inst_name = inst_name[cat_index] 175 | 176 | if super_string in inst_name: 177 | inst_name = inst_name.split(super_string)[1] 178 | 179 | tmp_index = cat_data['types'].index(inst_name) 180 | 181 | cat_data['counts'][inst_clust][tmp_index] = cat_data['counts'][inst_clust][tmp_index] + 1 182 | 183 | # calculate fractions 184 | for inst_clust in range(num_samples): 185 | # get array 186 | counts = cat_data['counts'][inst_clust] 187 | inst_total = np.sum(counts) 188 | cat_data['counts'][inst_clust] = cat_data['counts'][inst_clust] / inst_total 189 | 190 | all_cats.append(cat_data) 191 | 192 | return all_cats 193 | 194 | def calc_mbk_clusters(X, n_clusters, random_state=1000): 195 | 196 | # kmeans is run with rows as data-points and columns as dimensions 197 | mbk = MiniBatchKMeans(init='k-means++', n_clusters=n_clusters, 198 | max_no_improvement=100, verbose=0, 199 | random_state=random_state) 200 | 201 | # need to loop through each label (each k-means cluster) and count how many 202 | # points were given this label. This will give the population size of each label 203 | mbk.fit(X) 204 | cluster_data = mbk.labels_ 205 | clusters = mbk.cluster_centers_ 206 | 207 | mbk_cluster_names, cluster_pop = np.unique(cluster_data, return_counts=True) 208 | 209 | num_returned_clusters = len(cluster_pop) 210 | 211 | return clusters, num_returned_clusters, cluster_data, cluster_pop -------------------------------------------------------------------------------- /clustergrammer/enrichr_functions.py: -------------------------------------------------------------------------------- 1 | def add_enrichr_cats(df, inst_rc, run_enrichr, num_terms=10): 2 | from copy import deepcopy 3 | 4 | tmp_gene_list = deepcopy(df.index.tolist()) 5 | 6 | gene_list = [] 7 | if type(tmp_gene_list[0]) is tuple: 8 | for inst_tuple in tmp_gene_list: 9 | gene_list.append(inst_tuple[0]) 10 | else: 11 | gene_list = tmp_gene_list 12 | 13 | orig_gene_list = deepcopy(gene_list) 14 | 15 | # set up for non-tuple case first 16 | if ': ' in gene_list[0]: 17 | # strip titles 18 | gene_list = [inst_gene.split(': ')[1] for inst_gene in gene_list] 19 | 20 | # strip extra information (e.g. PTMs) 21 | gene_list = [inst_gene.split('_')[0] for inst_gene in gene_list] 22 | gene_list = [inst_gene.split(' ')[0] for inst_gene in gene_list] 23 | gene_list = [inst_gene.split('-')[0] for inst_gene in gene_list] 24 | 25 | user_list_id = post_request(gene_list) 26 | 27 | enr, response_list = get_request(run_enrichr, user_list_id, max_terms=20) 28 | 29 | # p-value, adjusted pvalue, z-score, combined score, genes 30 | # 1: Term 31 | # 2: P-value 32 | # 3: Z-score 33 | # 4: Combined Score 34 | # 5: Genes 35 | # 6: pval_bh 36 | 37 | # while generating categories store as list of lists, then convert to list of 38 | # tuples 39 | 40 | bar_info = [] 41 | cat_list = [] 42 | for inst_gene in orig_gene_list: 43 | cat_list.append([inst_gene]) 44 | 45 | for inst_enr in response_list[0:num_terms]: 46 | inst_term = inst_enr[1] 47 | inst_pval = inst_enr[2] 48 | inst_cs = inst_enr[4] 49 | inst_list = inst_enr[5] 50 | 51 | pval_string = '
Pval ' + str(inst_pval) + '
' 52 | 53 | bar_info.append(inst_cs) 54 | 55 | for inst_info in cat_list: 56 | 57 | # strip titles 58 | gene_name = inst_info[0] 59 | 60 | if ': ' in gene_name: 61 | gene_name = gene_name.split(': ')[1] 62 | 63 | # strip extra information (e.g. PTMs) 64 | gene_name = gene_name.split('_')[0] 65 | gene_name = gene_name.split(' ')[0] 66 | gene_name = gene_name.split('-')[0] 67 | 68 | if gene_name in inst_list: 69 | inst_info.append(inst_term+': True'+ pval_string) 70 | else: 71 | inst_info.append(inst_term+': False'+pval_string) 72 | 73 | cat_list = [tuple(x) for x in cat_list] 74 | 75 | df.index = cat_list 76 | 77 | return df, bar_info 78 | 79 | def clust_from_response(response_list): 80 | from clustergrammer import Network 81 | import scipy 82 | import json 83 | import pandas as pd 84 | import math 85 | from copy import deepcopy 86 | 87 | # print('----------------------') 88 | # print('enrichr_clust_from_response') 89 | # print('----------------------') 90 | 91 | ini_enr = transfer_to_enr_dict( response_list ) 92 | 93 | enr = [] 94 | scores = {} 95 | score_types = ['combined_score','pval','zscore'] 96 | 97 | for score_type in score_types: 98 | scores[score_type] = pd.Series() 99 | 100 | for inst_enr in ini_enr: 101 | if inst_enr['combined_score'] > 0: 102 | 103 | # make series of enriched terms with scores 104 | for score_type in score_types: 105 | 106 | # collect the scores of the enriched terms 107 | if score_type == 'combined_score': 108 | scores[score_type][inst_enr['name']] = inst_enr[score_type] 109 | if score_type == 'pval': 110 | scores[score_type][inst_enr['name']] = -math.log(inst_enr[score_type]) 111 | if score_type == 'zscore': 112 | scores[score_type][inst_enr['name']] = -inst_enr[score_type] 113 | 114 | # keep enrichement values 115 | enr.append(inst_enr) 116 | 117 | # sort and normalize the scores 118 | for score_type in score_types: 119 | scores[score_type] = scores[score_type]/scores[score_type].max() 120 | scores[score_type].sort_values(ascending=False) 121 | 122 | number_of_enriched_terms = len(scores['combined_score']) 123 | 124 | enr_score_types = ['combined_score','pval','zscore'] 125 | 126 | if number_of_enriched_terms <10: 127 | num_dict = {'ten':10} 128 | elif number_of_enriched_terms <20: 129 | num_dict = {'ten':10, 'twenty':20} 130 | else: 131 | num_dict = {'ten':10, 'twenty':20, 'thirty':30} 132 | 133 | # gather lists of top scores 134 | top_terms = {} 135 | for enr_type in enr_score_types: 136 | top_terms[enr_type] = {} 137 | for num_terms in list(num_dict.keys()): 138 | inst_num = num_dict[num_terms] 139 | top_terms[enr_type][num_terms] = scores[enr_type].index.tolist()[: inst_num] 140 | 141 | # gather the terms that should be kept - they are at the top of the score list 142 | keep_terms = [] 143 | for inst_enr_score in top_terms: 144 | for tmp_num in list(num_dict.keys()): 145 | keep_terms.extend( top_terms[inst_enr_score][tmp_num] ) 146 | 147 | keep_terms = list(set(keep_terms)) 148 | 149 | # keep enriched terms that are at the top 10 based on at least one score 150 | keep_enr = [] 151 | for inst_enr in enr: 152 | if inst_enr['name'] in keep_terms: 153 | keep_enr.append(inst_enr) 154 | 155 | 156 | # fill in full matrix 157 | ####################### 158 | 159 | # genes 160 | row_node_names = [] 161 | # enriched terms 162 | col_node_names = [] 163 | 164 | # gather information from the list of enriched terms 165 | for inst_enr in keep_enr: 166 | col_node_names.append(inst_enr['name']) 167 | row_node_names.extend(inst_enr['int_genes']) 168 | 169 | row_node_names = sorted(list(set(row_node_names))) 170 | 171 | net = Network() 172 | net.dat['nodes']['row'] = row_node_names 173 | net.dat['nodes']['col'] = col_node_names 174 | net.dat['mat'] = scipy.zeros([len(row_node_names),len(col_node_names)]) 175 | 176 | for inst_enr in keep_enr: 177 | 178 | inst_term = inst_enr['name'] 179 | col_index = col_node_names.index(inst_term) 180 | 181 | # use combined score for full matrix - will not be seen in viz 182 | tmp_score = scores['combined_score'][inst_term] 183 | net.dat['node_info']['col']['value'].append(tmp_score) 184 | 185 | for inst_gene in inst_enr['int_genes']: 186 | row_index = row_node_names.index(inst_gene) 187 | 188 | # save association 189 | net.dat['mat'][row_index, col_index] = 1 190 | 191 | # cluster full matrix 192 | ############################# 193 | # do not make multiple views 194 | views = [''] 195 | 196 | if len(net.dat['nodes']['row']) > 1: 197 | net.make_clust(dist_type='jaccard', views=views, dendro=False) 198 | else: 199 | net.make_clust(dist_type='jaccard', views=views, dendro=False, run_clustering=False) 200 | 201 | # get dataframe from full matrix 202 | df = net.dat_to_df() 203 | 204 | for score_type in score_types: 205 | 206 | for num_terms in num_dict: 207 | 208 | inst_df = deepcopy(df) 209 | inst_net = deepcopy(Network()) 210 | 211 | inst_df['mat'] = inst_df['mat'][top_terms[score_type][num_terms]] 212 | 213 | # load back into net 214 | inst_net.df_to_dat(inst_df) 215 | 216 | # make views 217 | if len(net.dat['nodes']['row']) > 1: 218 | inst_net.make_clust(dist_type='jaccard', views=['N_row_sum'], dendro=False) 219 | else: 220 | inst_net.make_clust(dist_type='jaccard', views=['N_row_sum'], dendro=False, run_clustering = False) 221 | 222 | inst_views = inst_net.viz['views'] 223 | 224 | # add score_type to views 225 | for inst_view in inst_views: 226 | 227 | inst_view['N_col_sum'] = num_dict[num_terms] 228 | 229 | inst_view['enr_score_type'] = score_type 230 | 231 | # add values to col_nodes and order according to rank 232 | for inst_col in inst_view['nodes']['col_nodes']: 233 | 234 | inst_col['rank'] = len(top_terms[score_type][num_terms]) - top_terms[score_type][num_terms].index(inst_col['name']) 235 | 236 | inst_name = inst_col['name'] 237 | inst_col['value'] = scores[score_type][inst_name] 238 | 239 | # add views to main network 240 | net.viz['views'].extend(inst_views) 241 | 242 | return net 243 | 244 | # make the get request to enrichr using the requests library 245 | # this is done before making the get request with the lib name 246 | def post_request(input_genes, meta=''): 247 | # get metadata 248 | import requests 249 | import json 250 | 251 | # stringify list 252 | input_genes = '\n'.join(input_genes) 253 | 254 | # define post url 255 | post_url = 'http://amp.pharm.mssm.edu/Enrichr/addList' 256 | 257 | # define parameters 258 | params = {'list':input_genes, 'description':''} 259 | 260 | # make request: post the gene list 261 | post_response = requests.post( post_url, files=params) 262 | 263 | # load json 264 | inst_dict = json.loads( post_response.text ) 265 | userListId = str(inst_dict['userListId']) 266 | 267 | # return the userListId that is needed to reference the list later 268 | return userListId 269 | 270 | # make the get request to enrichr using the requests library 271 | # this is done after submitting post request with the input gene list 272 | def get_request(lib, userListId, max_terms=50 ): 273 | import requests 274 | import json 275 | 276 | # convert userListId to string 277 | userListId = str(userListId) 278 | 279 | # define the get url 280 | get_url = 'http://amp.pharm.mssm.edu/Enrichr/enrich' 281 | 282 | # get parameters 283 | params = {'backgroundType':lib,'userListId':userListId} 284 | 285 | # try get request until status code is 200 286 | inst_status_code = 400 287 | 288 | # wait until okay status code is returned 289 | num_try = 0 290 | 291 | # print(('\tEnrichr enrichment get req userListId: '+str(userListId))) 292 | 293 | while inst_status_code == 400 and num_try < 100: 294 | num_try = num_try +1 295 | try: 296 | # make the get request to get the enrichr results 297 | 298 | try: 299 | get_response = requests.get( get_url, params=params ) 300 | 301 | # get status_code 302 | inst_status_code = get_response.status_code 303 | 304 | except: 305 | print('retry get request') 306 | 307 | except: 308 | print('get requests failed') 309 | 310 | # load as dictionary 311 | resp_json = json.loads( get_response.text ) 312 | 313 | # get the key 314 | only_key = list(resp_json.keys())[0] 315 | 316 | # get response_list 317 | response_list = resp_json[only_key] 318 | 319 | # transfer the response_list to the enr_dict 320 | enr = transfer_to_enr_dict( response_list, max_terms ) 321 | 322 | # return enrichment json and userListId 323 | return enr, response_list 324 | 325 | # transfer the response_list to a list of dictionaries 326 | def transfer_to_enr_dict(response_list, max_terms=50): 327 | 328 | # # reduce the number of enriched terms if necessary 329 | # if len(response_list) < num_terms: 330 | # num_terms = len(response_list) 331 | 332 | # p-value, adjusted pvalue, z-score, combined score, genes 333 | # 1: Term 334 | # 2: P-value 335 | # 3: Z-score 336 | # 4: Combined Score 337 | # 5: Genes 338 | # 6: pval_bh 339 | 340 | num_enr_term = len(response_list) 341 | if num_enr_term > max_terms: 342 | num_enr_term = max_terms 343 | 344 | # transfer response_list to enr structure 345 | # and only keep the top terms 346 | # 347 | # initialize enr 348 | enr = [] 349 | for i in range(num_enr_term): 350 | 351 | # get list element 352 | inst_enr = response_list[i] 353 | 354 | # initialize dict 355 | inst_dict = {} 356 | 357 | # transfer term 358 | inst_dict['name'] = inst_enr[1] 359 | # transfer pval 360 | inst_dict['pval'] = inst_enr[2] 361 | # transfer zscore 362 | inst_dict['zscore'] = inst_enr[3] 363 | # transfer combined_score 364 | inst_dict['combined_score'] = inst_enr[4] 365 | # transfer int_genes 366 | inst_dict['int_genes'] = inst_enr[5] 367 | # adjusted pval 368 | inst_dict['pval_bh'] = inst_enr[6] 369 | 370 | # append dict 371 | enr.append(inst_dict) 372 | 373 | return enr 374 | 375 | -------------------------------------------------------------------------------- /clustergrammer/export_data.py: -------------------------------------------------------------------------------- 1 | def export_net_json(net, net_type, indent='no-indent'): 2 | ''' export json string of dat ''' 3 | import json 4 | from copy import deepcopy 5 | 6 | if net_type == 'dat': 7 | exp_dict = deepcopy(net.dat) 8 | 9 | if type(exp_dict['mat']) is not list: 10 | exp_dict['mat'] = exp_dict['mat'].tolist() 11 | if 'mat_orig' in exp_dict: 12 | exp_dict['mat_orig'] = exp_dict['mat_orig'].tolist() 13 | 14 | elif net_type == 'viz': 15 | exp_dict = net.viz 16 | 17 | elif net_type == 'sim_row': 18 | exp_dict = net.sim['row'] 19 | 20 | elif net_type == 'sim_col': 21 | exp_dict = net.sim['col'] 22 | 23 | # make json 24 | if indent == 'indent': 25 | exp_json = json.dumps(exp_dict, indent=2) 26 | else: 27 | exp_json = json.dumps(exp_dict) 28 | 29 | return exp_json 30 | 31 | def write_matrix_to_tsv(net, filename=None, df=None): 32 | ''' 33 | This will export the matrix in net.dat or a dataframe (optional df in 34 | arguments) as a tsv file. Row/column categories will be saved as tuples in 35 | tsv, which can be read back into the network object. 36 | ''' 37 | import pandas as pd 38 | 39 | if df is None: 40 | df = net.dat_to_df() 41 | 42 | return df['mat'].to_csv(filename, sep='\t') 43 | 44 | def write_json_to_file(net, net_type, filename, indent='no-indent'): 45 | 46 | exp_json = net.export_net_json(net_type, indent) 47 | 48 | fw = open(filename, 'w') 49 | fw.write(exp_json) 50 | fw.close() 51 | 52 | def save_dict_to_json(inst_dict, filename, indent='no-indent'): 53 | import json 54 | fw = open(filename, 'w') 55 | if indent == 'indent': 56 | fw.write(json.dumps(inst_dict, indent=2)) 57 | else: 58 | fw.write(json.dumps(inst_dict)) 59 | fw.close() -------------------------------------------------------------------------------- /clustergrammer/iframe_web_app.py: -------------------------------------------------------------------------------- 1 | def main(net, filename=None, width=1000, height=800): 2 | import requests, json 3 | # from io import StringIO 4 | from IPython.display import IFrame, display 5 | 6 | try: 7 | from StringIO import StringIO 8 | except ImportError: 9 | from io import StringIO 10 | 11 | clustergrammer_url = 'http://amp.pharm.mssm.edu/clustergrammer/matrix_upload/' 12 | 13 | if filename is None: 14 | file_string = net.write_matrix_to_tsv() 15 | file_obj = StringIO(file_string) 16 | 17 | if net.dat['filename'] is None: 18 | fake_filename = 'Network.txt' 19 | else: 20 | fake_filename = net.dat['filename'] 21 | 22 | r = requests.post(clustergrammer_url, files={'file': (fake_filename, file_obj)}) 23 | else: 24 | file_obj = open(filename, 'r') 25 | r = requests.post(clustergrammer_url, files={'file': file_obj}) 26 | 27 | 28 | link = r.text 29 | 30 | display(IFrame(link, width=width, height=height)) 31 | 32 | return link -------------------------------------------------------------------------------- /clustergrammer/initialize_net.py: -------------------------------------------------------------------------------- 1 | def main(self, widget=None): 2 | 3 | self.dat = {} 4 | self.dat['nodes'] = {} 5 | self.dat['nodes']['row'] = [] 6 | self.dat['nodes']['col'] = [] 7 | self.dat['mat'] = [] 8 | 9 | self.dat['node_info'] = {} 10 | for inst_rc in self.dat['nodes']: 11 | self.dat['node_info'][inst_rc] = {} 12 | self.dat['node_info'][inst_rc]['ini'] = [] 13 | self.dat['node_info'][inst_rc]['clust'] = [] 14 | self.dat['node_info'][inst_rc]['rank'] = [] 15 | self.dat['node_info'][inst_rc]['info'] = [] 16 | self.dat['node_info'][inst_rc]['cat'] = [] 17 | self.dat['node_info'][inst_rc]['value'] = [] 18 | 19 | # check if net has categories predefined 20 | if hasattr(self, 'persistent_cat') == False: 21 | self.persistent_cat = False 22 | found_cats = False 23 | else: 24 | found_cats = True 25 | inst_cat_colors = self.viz['cat_colors'] 26 | 27 | # add widget if necessary 28 | if widget != None: 29 | self.widget_class = widget 30 | 31 | self.viz = {} 32 | self.viz['row_nodes'] = [] 33 | self.viz['col_nodes'] = [] 34 | self.viz['links'] = [] 35 | self.viz['mat'] = [] 36 | 37 | if found_cats == False: 38 | # print('no persistent_cat') 39 | self.viz['cat_colors'] = {} 40 | self.viz['cat_colors']['row'] = {} 41 | self.viz['cat_colors']['col'] = {} 42 | else: 43 | # print('yes persistent_cat') 44 | self.viz['cat_colors'] = inst_cat_colors 45 | 46 | self.sim = {} 47 | 48 | 49 | def viz(self, reset_cat_colors=False): 50 | 51 | # keep track of old cat_colors 52 | old_cat_colors = self.viz['cat_colors'] 53 | 54 | self.viz = {} 55 | self.viz['row_nodes'] = [] 56 | self.viz['col_nodes'] = [] 57 | self.viz['links'] = [] 58 | self.viz['mat'] = [] 59 | 60 | if reset_cat_colors == True: 61 | self.viz['cat_colors'] = {} 62 | self.viz['cat_colors']['row'] = {} 63 | self.viz['cat_colors']['col'] = {} 64 | else: 65 | self.viz['cat_colors'] = old_cat_colors 66 | -------------------------------------------------------------------------------- /clustergrammer/load_data.py: -------------------------------------------------------------------------------- 1 | import io, sys 2 | import json 3 | import pandas as pd 4 | from . import categories 5 | from . import proc_df_labels 6 | from . import data_formats 7 | from . import make_unique_labels 8 | 9 | try: 10 | from StringIO import StringIO 11 | except ImportError: 12 | from io import StringIO 13 | 14 | def load_file(net, filename): 15 | # reset network when loaing file, prevents errors when loading new file 16 | # have persistent categories 17 | 18 | # trying to improve re-initialization 19 | # net.__init__() 20 | net.reset() 21 | 22 | f = open(filename, 'r') 23 | 24 | file_string = f.read() 25 | f.close() 26 | 27 | load_file_as_string(net, file_string, filename) 28 | 29 | def load_file_as_string(net, file_string, filename=''): 30 | 31 | if (sys.version_info > (3, 0)): 32 | # python 3 33 | #################### 34 | file_string = str(file_string) 35 | else: 36 | # python 2 37 | #################### 38 | file_string = unicode(file_string) 39 | 40 | buff = io.StringIO(file_string) 41 | 42 | if '/' in filename: 43 | filename = filename.split('/')[-1] 44 | 45 | net.load_tsv_to_net(buff, filename) 46 | 47 | def load_stdin(net): 48 | data = '' 49 | 50 | for line in sys.stdin: 51 | data = data + line 52 | 53 | data = StringIO.StringIO(data) 54 | 55 | net.load_tsv_to_net(data) 56 | 57 | def load_tsv_to_net(net, file_buffer, filename=None): 58 | lines = file_buffer.getvalue().split('\n') 59 | num_labels = categories.check_categories(lines) 60 | 61 | row_arr = list(range(num_labels['row'])) 62 | col_arr = list(range(num_labels['col'])) 63 | tmp_df = {} 64 | 65 | # use header if there are col categories 66 | if len(col_arr) > 1: 67 | tmp_df['mat'] = pd.read_table(file_buffer, index_col=row_arr, 68 | header=col_arr) 69 | else: 70 | tmp_df['mat'] = pd.read_table(file_buffer, index_col=row_arr) 71 | 72 | tmp_df = proc_df_labels.main(tmp_df) 73 | 74 | net.df_to_dat(tmp_df, True) 75 | net.dat['filename'] = filename 76 | 77 | def load_json_to_dict(filename): 78 | f = open(filename, 'r') 79 | inst_dict = json.load(f) 80 | f.close() 81 | return inst_dict 82 | 83 | def load_gmt(filename): 84 | f = open(filename, 'r') 85 | lines = f.readlines() 86 | f.close() 87 | gmt = {} 88 | for i in range(len(lines)): 89 | inst_line = lines[i].rstrip() 90 | inst_term = inst_line.split('\t')[0] 91 | inst_elems = inst_line.split('\t')[2:] 92 | gmt[inst_term] = inst_elems 93 | 94 | return gmt 95 | 96 | def load_data_to_net(net, inst_net): 97 | ''' load data into nodes and mat, also convert mat to numpy array''' 98 | net.dat['nodes'] = inst_net['nodes'] 99 | net.dat['mat'] = inst_net['mat'] 100 | data_formats.mat_to_numpy_arr(net) -------------------------------------------------------------------------------- /clustergrammer/load_vect_post.py: -------------------------------------------------------------------------------- 1 | def main(real_net, vect_post): 2 | import numpy as np 3 | from copy import deepcopy 4 | from .__init__ import Network 5 | from . import proc_df_labels 6 | 7 | net = deepcopy(Network()) 8 | 9 | sigs = vect_post['columns'] 10 | 11 | all_rows = [] 12 | all_sigs = [] 13 | for inst_sig in sigs: 14 | all_sigs.append(inst_sig['col_name']) 15 | 16 | col_data = inst_sig['data'] 17 | 18 | for inst_row_data in col_data: 19 | all_rows.append(inst_row_data['row_name']) 20 | 21 | all_rows = sorted(list(set(all_rows))) 22 | all_sigs = sorted(list(set(all_sigs))) 23 | 24 | net.dat['nodes']['row'] = all_rows 25 | net.dat['nodes']['col'] = all_sigs 26 | 27 | net.dat['mat'] = np.empty((len(all_rows), len(all_sigs))) 28 | net.dat['mat'][:] = np.nan 29 | 30 | is_up_down = False 31 | if 'is_up_down' in vect_post: 32 | if vect_post['is_up_down'] is True: 33 | is_up_down = True 34 | 35 | if is_up_down is True: 36 | net.dat['mat_up'] = np.empty((len(all_rows), len(all_sigs))) 37 | net.dat['mat_up'][:] = np.nan 38 | 39 | net.dat['mat_dn'] = np.empty((len(all_rows), len(all_sigs))) 40 | net.dat['mat_dn'][:] = np.nan 41 | 42 | for inst_sig in sigs: 43 | inst_sig_name = inst_sig['col_name'] 44 | col_data = inst_sig['data'] 45 | 46 | for inst_row_data in col_data: 47 | inst_row = inst_row_data['row_name'] 48 | inst_value = inst_row_data['val'] 49 | 50 | row_index = all_rows.index(inst_row) 51 | col_index = all_sigs.index(inst_sig_name) 52 | 53 | net.dat['mat'][row_index, col_index] = inst_value 54 | 55 | if is_up_down is True: 56 | net.dat['mat_up'][row_index, col_index] = inst_row_data['val_up'] 57 | net.dat['mat_dn'][row_index, col_index] = inst_row_data['val_dn'] 58 | 59 | tmp_df = net.dat_to_df() 60 | tmp_df = proc_df_labels.main(tmp_df) 61 | 62 | real_net.df_to_dat(tmp_df) 63 | -------------------------------------------------------------------------------- /clustergrammer/make_clust_fun.py: -------------------------------------------------------------------------------- 1 | def make_clust(net, dist_type='cosine', run_clustering=True, dendro=True, 2 | requested_views=['pct_row_sum', 'N_row_sum'], 3 | linkage_type='average', sim_mat=False, filter_sim=0.1, 4 | calc_cat_pval=False, sim_mat_views=['N_row_sum'], 5 | run_enrichr=None, enrichrgram=None): 6 | ''' 7 | This will calculate multiple views of a clustergram by filtering the 8 | data and clustering after each filtering. This filtering will keep the top 9 | N rows based on some quantity (sum, num-non-zero, etc). 10 | ''' 11 | from copy import deepcopy 12 | import scipy 13 | from . import calc_clust, run_filter, make_views, make_sim_mat, cat_pval 14 | from . import enrichr_functions as enr_fun 15 | 16 | df = net.dat_to_df() 17 | 18 | threshold = 0.0001 19 | df = run_filter.df_filter_row_sum(df, threshold) 20 | df = run_filter.df_filter_col_sum(df, threshold) 21 | 22 | # default setting 23 | define_cat_colors = False 24 | 25 | if run_enrichr is not None: 26 | df = enr_fun.add_enrichr_cats(df, 'row', run_enrichr) 27 | 28 | define_cat_colors = True 29 | 30 | # calculate initial view with no row filtering 31 | net.df_to_dat(df, define_cat_colors=True) 32 | 33 | 34 | inst_dm = calc_clust.cluster_row_and_col(net, dist_type=dist_type, 35 | linkage_type=linkage_type, 36 | run_clustering=run_clustering, 37 | dendro=dendro, ignore_cat=False, 38 | calc_cat_pval=calc_cat_pval) 39 | 40 | all_views = [] 41 | send_df = deepcopy(df) 42 | 43 | if 'N_row_sum' in requested_views: 44 | all_views = make_views.N_rows(net, send_df, all_views, 45 | dist_type=dist_type, rank_type='sum') 46 | 47 | if 'N_row_var' in requested_views: 48 | all_views = make_views.N_rows(net, send_df, all_views, 49 | dist_type=dist_type, rank_type='var') 50 | 51 | if 'pct_row_sum' in requested_views: 52 | all_views = make_views.pct_rows(net, send_df, all_views, 53 | dist_type=dist_type, rank_type='sum') 54 | 55 | if 'pct_row_var' in requested_views: 56 | all_views = make_views.pct_rows(net, send_df, all_views, 57 | dist_type=dist_type, rank_type='var') 58 | 59 | which_sim = [] 60 | 61 | if sim_mat == True: 62 | which_sim = ['row', 'col'] 63 | elif sim_mat == 'row': 64 | which_sim = ['row'] 65 | elif sim_mat == 'col': 66 | which_sim = ['col'] 67 | 68 | if sim_mat is not False: 69 | sim_net = make_sim_mat.main(net, inst_dm, which_sim, filter_sim, sim_mat_views) 70 | 71 | net.sim = {} 72 | 73 | for inst_rc in which_sim: 74 | net.sim[inst_rc] = sim_net[inst_rc].viz 75 | 76 | if inst_rc == 'row': 77 | other_rc = 'col' 78 | elif inst_rc == 'col': 79 | other_rc = 'row' 80 | 81 | # keep track of cat_colors 82 | net.sim[inst_rc]['cat_colors'][inst_rc] = net.viz['cat_colors'][inst_rc] 83 | net.sim[inst_rc]['cat_colors'][other_rc] = net.viz['cat_colors'][inst_rc] 84 | 85 | else: 86 | net.sim = {} 87 | 88 | net.viz['views'] = all_views 89 | 90 | if enrichrgram != None: 91 | # toggle enrichrgram functionality from back-end 92 | net.viz['enrichrgram'] = enrichrgram 93 | 94 | if 'enrichrgram_lib' in net.dat: 95 | net.viz['enrichrgram'] = True 96 | net.viz['enrichrgram_lib'] = net.dat['enrichrgram_lib'] 97 | 98 | if 'row_cat_bars' in net.dat: 99 | net.viz['row_cat_bars'] = net.dat['row_cat_bars'] 100 | -------------------------------------------------------------------------------- /clustergrammer/make_sim_mat.py: -------------------------------------------------------------------------------- 1 | def main(net, inst_dm, which_sim, filter_sim, sim_mat_views=['N_row_sum']): 2 | from .__init__ import Network 3 | from copy import deepcopy 4 | from . import calc_clust, make_views 5 | 6 | print('in make_sim_mat, which_sim: ' + str(which_sim)) 7 | 8 | sim_dict = {} 9 | 10 | for inst_rc in which_sim: 11 | 12 | sim_dict[inst_rc] = dm_to_sim(inst_dm[inst_rc], make_squareform=True, 13 | filter_sim=filter_sim) 14 | 15 | sim_net = {} 16 | 17 | for inst_rc in which_sim: 18 | 19 | sim_net[inst_rc] = deepcopy(Network()) 20 | 21 | sim_net[inst_rc].dat['mat'] = sim_dict[inst_rc] 22 | 23 | sim_net[inst_rc].dat['nodes']['row'] = net.dat['nodes'][inst_rc] 24 | sim_net[inst_rc].dat['nodes']['col'] = net.dat['nodes'][inst_rc] 25 | 26 | sim_net[inst_rc].dat['node_info']['row'] = net.dat['node_info'][inst_rc] 27 | sim_net[inst_rc].dat['node_info']['col'] = net.dat['node_info'][inst_rc] 28 | 29 | calc_clust.cluster_row_and_col(sim_net[inst_rc]) 30 | 31 | all_views = [] 32 | df = sim_net[inst_rc].dat_to_df() 33 | send_df = deepcopy(df) 34 | 35 | if 'N_row_sum' in sim_mat_views: 36 | all_views = make_views.N_rows(net, send_df, all_views, 37 | dist_type='cos', rank_type='sum') 38 | 39 | sim_net[inst_rc].viz['views'] = all_views 40 | 41 | return sim_net 42 | 43 | def dm_to_sim(inst_dm, make_squareform=False, filter_sim=0): 44 | import numpy as np 45 | from scipy.spatial.distance import squareform 46 | 47 | if make_squareform is True: 48 | inst_dm = squareform(inst_dm) 49 | 50 | inst_sim_mat = 1 - inst_dm 51 | 52 | if filter_sim > 0: 53 | filter_sim = adjust_filter_sim(inst_sim_mat, filter_sim) 54 | inst_sim_mat[ np.abs(inst_sim_mat) < filter_sim] = 0 55 | 56 | return inst_sim_mat 57 | 58 | def adjust_filter_sim(inst_dm, filter_sim, keep_top=20000): 59 | import pandas as pd 60 | import numpy as np 61 | 62 | inst_df = pd.DataFrame(inst_dm) 63 | val_vect = np.abs(inst_df.values.flatten()) 64 | 65 | val_vect = val_vect[val_vect > 0.01] 66 | 67 | if len(val_vect) > keep_top: 68 | 69 | 70 | inst_series = pd.Series(val_vect) 71 | inst_series.sort_values(ascending=False) 72 | 73 | sort_values = inst_series.values 74 | 75 | filter_sim = sort_values[keep_top] 76 | 77 | return filter_sim -------------------------------------------------------------------------------- /clustergrammer/make_unique_labels.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | 3 | def main(net, df=None): 4 | ''' 5 | Run in load_data module (which runs when file is loaded or dataframe is loaded), 6 | check for duplicate row/col names, and add index to names if necesary 7 | ''' 8 | if df is None: 9 | df = net.export_df() 10 | 11 | # rows 12 | ############# 13 | rows = df.index.tolist() 14 | if type(rows[0]) is str: 15 | 16 | if len(rows) != len(list(set(rows))): 17 | new_rows = add_index_list(rows) 18 | df.index = new_rows 19 | 20 | elif type(rows[0]) is tuple: 21 | 22 | row_names = [] 23 | for inst_row in rows: 24 | row_names.append(inst_row[0]) 25 | 26 | if len(row_names) != len(list(set(row_names))): 27 | row_names = add_index_list(row_names) 28 | 29 | # add back to tuple 30 | new_rows = [] 31 | for inst_index in range(len(rows)): 32 | inst_row = rows[inst_index] 33 | new_row = list(inst_row) 34 | new_row[0] = row_names[inst_index] 35 | new_row = tuple(new_row) 36 | new_rows.append(new_row) 37 | 38 | df.index = new_rows 39 | 40 | # cols 41 | ############# 42 | cols = df.columns.tolist() 43 | if type(cols[0]) is str: 44 | 45 | # list column names 46 | if len(cols) != len(list(set(cols))): 47 | new_cols = add_index_list(cols) 48 | df.columns = new_cols 49 | 50 | elif type(cols[0]) is tuple: 51 | 52 | col_names = [] 53 | for inst_col in cols: 54 | col_names.append(inst_col[0]) 55 | 56 | if len(col_names) != len(list(set(col_names))): 57 | col_names = add_index_list(col_names) 58 | 59 | # add back to tuple 60 | new_cols = [] 61 | for inst_index in range(len(cols)): 62 | inst_col = cols[inst_index] 63 | new_col = list(inst_col) 64 | new_col[0] = col_names[inst_index] 65 | new_col = tuple(new_col) 66 | new_cols.append(new_col) 67 | 68 | df.columns = new_cols 69 | 70 | # return dataframe with unique names 71 | return df 72 | 73 | def add_index_list(nodes): 74 | 75 | new_nodes = [] 76 | for i in range(len(nodes)): 77 | index = i + 1 78 | inst_node = nodes[i] 79 | new_node = inst_node + '-' + str(index) 80 | new_nodes.append(new_node) 81 | 82 | return new_nodes 83 | -------------------------------------------------------------------------------- /clustergrammer/make_views.py: -------------------------------------------------------------------------------- 1 | def N_rows(net, df, all_views, dist_type='cosine', rank_type='sum'): 2 | from copy import deepcopy 3 | from .__init__ import Network 4 | from . import calc_clust, run_filter 5 | 6 | keep_top = ['all', 500, 250, 100, 50, 20, 10] 7 | 8 | rows_sorted = run_filter.get_sorted_rows(df['mat'], rank_type) 9 | 10 | for inst_keep in keep_top: 11 | 12 | tmp_df = deepcopy(df) 13 | 14 | check_keep_num = inst_keep 15 | 16 | # convert 'all' to -1 to clean up checking mechanism 17 | if check_keep_num == 'all': 18 | check_keep_num = -1 19 | 20 | if check_keep_num < len(rows_sorted): 21 | 22 | tmp_net = deepcopy(Network()) 23 | 24 | if inst_keep != 'all': 25 | 26 | keep_rows = rows_sorted[0:inst_keep] 27 | 28 | tmp_df['mat'] = tmp_df['mat'].ix[keep_rows] 29 | if 'mat_up' in tmp_df: 30 | tmp_df['mat_up'] = tmp_df['mat_up'].ix[keep_rows] 31 | tmp_df['mat_dn'] = tmp_df['mat_dn'].ix[keep_rows] 32 | if 'mat_orig' in tmp_df: 33 | tmp_df['mat_orig'] = tmp_df['mat_orig'].ix[keep_rows] 34 | 35 | tmp_df = run_filter.df_filter_col_sum(tmp_df, 0.001) 36 | tmp_net.df_to_dat(tmp_df) 37 | 38 | else: 39 | tmp_net.df_to_dat(tmp_df) 40 | 41 | try: 42 | try: 43 | calc_clust.cluster_row_and_col(tmp_net, dist_type, run_clustering=True) 44 | except: 45 | calc_clust.cluster_row_and_col(tmp_net, dist_type, run_clustering=False) 46 | 47 | # add view 48 | inst_view = {} 49 | inst_view['N_row_' + rank_type] = inst_keep 50 | inst_view['dist'] = 'cos' 51 | inst_view['nodes'] = {} 52 | inst_view['nodes']['row_nodes'] = tmp_net.viz['row_nodes'] 53 | inst_view['nodes']['col_nodes'] = tmp_net.viz['col_nodes'] 54 | all_views.append(inst_view) 55 | 56 | except: 57 | # print('\t*** did not cluster N filtered view') 58 | pass 59 | 60 | return all_views 61 | 62 | def pct_rows(net, df, all_views, dist_type, rank_type): 63 | from .__init__ import Network 64 | from copy import deepcopy 65 | import numpy as np 66 | from . import calc_clust, run_filter 67 | 68 | copy_net = deepcopy(net) 69 | 70 | if len(net.dat['node_info']['col']['cat']) > 0: 71 | cat_key_col = {} 72 | for i in range(len(net.dat['nodes']['col'])): 73 | cat_key_col[net.dat['nodes']['col'][i]] = \ 74 | net.dat['node_info']['col']['cat'][i] 75 | 76 | all_filt = list(range(10)) 77 | all_filt = [i / float(10) for i in all_filt] 78 | 79 | mat = deepcopy(df['mat']) 80 | sum_row = np.sum(mat, axis=1) 81 | max_sum = max(sum_row) 82 | 83 | for inst_filt in all_filt: 84 | 85 | cutoff = inst_filt * max_sum 86 | copy_net = deepcopy(net) 87 | inst_df = deepcopy(df) 88 | inst_df = run_filter.df_filter_row_sum(inst_df, cutoff, take_abs=False) 89 | 90 | tmp_net = deepcopy(Network()) 91 | tmp_net.df_to_dat(inst_df) 92 | 93 | try: 94 | try: 95 | calc_clust.cluster_row_and_col(tmp_net, dist_type=dist_type, 96 | run_clustering=True) 97 | 98 | except: 99 | calc_clust.cluster_row_and_col(tmp_net, dist_type=dist_type, 100 | run_clustering=False) 101 | 102 | inst_view = {} 103 | inst_view['pct_row_' + rank_type] = inst_filt 104 | inst_view['dist'] = 'cos' 105 | inst_view['nodes'] = {} 106 | inst_view['nodes']['row_nodes'] = tmp_net.viz['row_nodes'] 107 | inst_view['nodes']['col_nodes'] = tmp_net.viz['col_nodes'] 108 | 109 | all_views.append(inst_view) 110 | 111 | except: 112 | pass 113 | 114 | return all_views -------------------------------------------------------------------------------- /clustergrammer/make_viz.py: -------------------------------------------------------------------------------- 1 | def viz_json(net, dendro=True, links=False): 2 | ''' make the dictionary for the clustergram.js visualization ''' 3 | from . import calc_clust 4 | import numpy as np 5 | 6 | all_dist = calc_clust.group_cutoffs() 7 | 8 | for inst_rc in net.dat['nodes']: 9 | 10 | inst_keys = net.dat['node_info'][inst_rc] 11 | all_cats = [x for x in inst_keys if 'cat-' in x] 12 | 13 | for i in range(len(net.dat['nodes'][inst_rc])): 14 | inst_dict = {} 15 | inst_dict['name'] = net.dat['nodes'][inst_rc][i] 16 | inst_dict['ini'] = net.dat['node_info'][inst_rc]['ini'][i] 17 | inst_dict['clust'] = net.dat['node_info'][inst_rc]['clust'].index(i) 18 | inst_dict['rank'] = net.dat['node_info'][inst_rc]['rank'][i] 19 | 20 | if 'rankvar' in inst_keys: 21 | inst_dict['rankvar'] = net.dat['node_info'][inst_rc]['rankvar'][i] 22 | 23 | # fix for similarity matrix 24 | if len(all_cats) > 0: 25 | 26 | for inst_name_cat in all_cats: 27 | 28 | actual_cat_name = net.dat['node_info'][inst_rc][inst_name_cat][i] 29 | inst_dict[inst_name_cat] = actual_cat_name 30 | 31 | check_pval = 'pval_'+inst_name_cat.replace('-','_') 32 | 33 | if check_pval in net.dat['node_info'][inst_rc]: 34 | tmp_pval_name = inst_name_cat.replace('-','_') + '_pval' 35 | inst_dict[tmp_pval_name] = net.dat['node_info'][inst_rc][check_pval][actual_cat_name] 36 | 37 | tmp_index_name = inst_name_cat.replace('-', '_') + '_index' 38 | 39 | inst_dict[tmp_index_name] = net.dat['node_info'][inst_rc] \ 40 | [tmp_index_name][i] 41 | 42 | 43 | if len(net.dat['node_info'][inst_rc]['value']) > 0: 44 | inst_dict['value'] = net.dat['node_info'][inst_rc]['value'][i] 45 | 46 | if len(net.dat['node_info'][inst_rc]['info']) > 0: 47 | inst_dict['info'] = net.dat['node_info'][inst_rc]['info'][i] 48 | 49 | if dendro is True: 50 | inst_dict['group'] = [] 51 | for tmp_dist in all_dist: 52 | tmp_dist = str(tmp_dist).replace('.', '') 53 | tmp_append = float( 54 | net.dat['node_info'][inst_rc]['group'][tmp_dist][i]) 55 | inst_dict['group'].append(tmp_append) 56 | 57 | net.viz[inst_rc + '_nodes'].append(inst_dict) 58 | 59 | mat_types = ['mat', 'mat_orig', 'mat_info', 'mat_hl', 'mat_up', 'mat_dn'] 60 | 61 | # save data as links or mat 62 | ########################### 63 | if links is True: 64 | for i in range(len(net.dat['nodes']['row'])): 65 | for j in range(len(net.dat['nodes']['col'])): 66 | 67 | inst_dict = {} 68 | inst_dict['source'] = i 69 | inst_dict['target'] = j 70 | inst_dict['value'] = float(net.dat['mat'][i, j]) 71 | 72 | if 'mat_up' in net.dat: 73 | inst_dict['value_up'] = net.dat['mat_up'][i, j] 74 | inst_dict['value_dn'] = net.dat['mat_dn'][i, j] 75 | 76 | if 'mat_orig' in net.dat: 77 | inst_dict['value_orig'] = net.dat['mat_orig'][i, j] 78 | 79 | if np.isnan(inst_dict['value_orig']): 80 | inst_dict['value_orig'] = 'NaN' 81 | 82 | 83 | if 'mat_info' in net.dat: 84 | inst_dict['info'] = net.dat['mat_info'][str((i, j))] 85 | 86 | if 'mat_hl' in net.dat: 87 | inst_dict['highlight'] = net.dat['mat_hl'][i, j] 88 | 89 | net.viz['links'].append(inst_dict) 90 | 91 | else: 92 | for inst_mat in mat_types: 93 | if inst_mat in net.dat: 94 | net.viz[inst_mat] = net.dat[inst_mat].tolist() 95 | 96 | 97 | -------------------------------------------------------------------------------- /clustergrammer/normalize_fun.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from copy import deepcopy 4 | 5 | def run_norm(net, df=None, norm_type='zscore', axis='row', keep_orig=False): 6 | ''' 7 | A dataframe (more accurately a dictionary of dataframes, e.g. mat, 8 | mat_up...) can be passed to run_norm and a normalization will be run ( 9 | e.g. zscore) on either the rows or columns 10 | ''' 11 | 12 | # df here is actually a dictionary of several dataframes, 'mat', 'mat_orig', 13 | # etc 14 | if df is None: 15 | df = net.dat_to_df() 16 | 17 | if norm_type == 'zscore': 18 | df = zscore_df(df, axis, keep_orig) 19 | 20 | if norm_type == 'qn': 21 | df = qn_df(df, axis, keep_orig) 22 | 23 | net.df_to_dat(df) 24 | 25 | def qn_df(df, axis='row', keep_orig=False): 26 | ''' 27 | do quantile normalization of a dataframe dictionary, does not write to net 28 | ''' 29 | df_qn = {} 30 | 31 | for mat_type in df: 32 | inst_df = df[mat_type] 33 | 34 | # using transpose to do row qn 35 | if axis == 'row': 36 | inst_df = inst_df.transpose() 37 | 38 | missing_values = inst_df.isnull().values.any() 39 | 40 | # make mask of missing values 41 | if missing_values: 42 | 43 | # get nan mask 44 | missing_mask = pd.isnull(inst_df) 45 | 46 | # tmp fill in na with zero, will not affect qn 47 | inst_df = inst_df.fillna(value=0) 48 | 49 | # calc common distribution 50 | common_dist = calc_common_dist(inst_df) 51 | 52 | # swap in common distribution 53 | inst_df = swap_in_common_dist(inst_df, common_dist) 54 | 55 | # swap back in missing values 56 | if missing_values: 57 | inst_df = inst_df.mask(missing_mask, other=np.nan) 58 | 59 | # using transpose to do row qn 60 | if axis == 'row': 61 | inst_df = inst_df.transpose() 62 | 63 | df_qn[mat_type] = inst_df 64 | 65 | return df_qn 66 | 67 | def swap_in_common_dist(df, common_dist): 68 | 69 | col_names = df.columns.tolist() 70 | 71 | qn_arr = np.array([]) 72 | orig_rows = df.index.tolist() 73 | 74 | # loop through each column 75 | for inst_col in col_names: 76 | 77 | # get the sorted list of row names for the given column 78 | tmp_series = deepcopy(df[inst_col]) 79 | tmp_series = tmp_series.sort_values(ascending=False) 80 | sorted_names = tmp_series.index.tolist() 81 | 82 | qn_vect = np.array([]) 83 | for inst_row in orig_rows: 84 | inst_index = sorted_names.index(inst_row) 85 | inst_val = common_dist[inst_index] 86 | qn_vect = np.hstack((qn_vect, inst_val)) 87 | 88 | if qn_arr.shape[0] == 0: 89 | qn_arr = qn_vect 90 | else: 91 | qn_arr = np.vstack((qn_arr, qn_vect)) 92 | 93 | # transpose (because of vstacking) 94 | qn_arr = qn_arr.transpose() 95 | 96 | qn_df = pd.DataFrame(data=qn_arr, columns=col_names, index=orig_rows) 97 | 98 | return qn_df 99 | 100 | def calc_common_dist(df): 101 | ''' 102 | calculate a common distribution (for col qn only) that will be used to qn 103 | ''' 104 | 105 | # axis is col 106 | tmp_arr = np.array([]) 107 | 108 | col_names = df.columns.tolist() 109 | 110 | for inst_col in col_names: 111 | 112 | # sort column 113 | tmp_vect = df[inst_col].sort_values(ascending=False).values 114 | 115 | # stacking rows vertically (will transpose) 116 | if tmp_arr.shape[0] == 0: 117 | tmp_arr = tmp_vect 118 | else: 119 | tmp_arr = np.vstack((tmp_arr, tmp_vect)) 120 | 121 | tmp_arr = tmp_arr.transpose() 122 | 123 | common_dist = tmp_arr.mean(axis=1) 124 | 125 | return common_dist 126 | 127 | def zscore_df(df, axis='row', keep_orig=False): 128 | ''' 129 | take the zscore of a dataframe dictionary, does not write to net (self) 130 | ''' 131 | df_z = {} 132 | 133 | for mat_type in df: 134 | if keep_orig and mat_type == 'mat': 135 | mat_orig = deepcopy(df[mat_type]) 136 | 137 | inst_df = df[mat_type] 138 | 139 | if axis == 'row': 140 | inst_df = inst_df.transpose() 141 | 142 | df_z[mat_type] = (inst_df - inst_df.mean())/inst_df.std() 143 | 144 | if axis == 'row': 145 | df_z[mat_type] = df_z[mat_type].transpose() 146 | 147 | if keep_orig: 148 | df_z['mat_orig'] = mat_orig 149 | 150 | return df_z 151 | -------------------------------------------------------------------------------- /clustergrammer/proc_df_labels.py: -------------------------------------------------------------------------------- 1 | def main(df): 2 | ''' 3 | 1) check that rows are strings (in case of numerical names) 4 | 2) check for tuples, and in that case load tuples to categories 5 | ''' 6 | import numpy as np 7 | from ast import literal_eval as make_tuple 8 | 9 | test = {} 10 | test['row'] = df['mat'].index.tolist() 11 | test['col'] = df['mat'].columns.tolist() 12 | 13 | # if type( test_row ) is not str and type( test_row ) is not tuple: 14 | 15 | found_tuple = {} 16 | found_number = {} 17 | for inst_rc in ['row','col']: 18 | 19 | inst_name = test[inst_rc][0] 20 | 21 | found_tuple[inst_rc] = False 22 | found_number[inst_rc] = False 23 | 24 | if type(inst_name) != tuple: 25 | 26 | if type(inst_name) is int or type(inst_name) is float or type(inst_name) is np.int64: 27 | found_number[inst_rc] = True 28 | 29 | else: 30 | check_open = inst_name[0] 31 | check_comma = inst_name.find(',') 32 | check_close = inst_name[-1] 33 | 34 | if check_open == '(' and check_close == ')' and check_comma > 0 \ 35 | and check_comma < len(inst_name): 36 | found_tuple[inst_rc] = True 37 | 38 | # convert to tuple if necessary 39 | ################################################# 40 | if found_tuple['row']: 41 | row_names = df['mat'].index.tolist() 42 | row_names = [make_tuple(x) for x in row_names] 43 | df['mat'].index = row_names 44 | 45 | if found_tuple['col']: 46 | col_names = df['mat'].columns.tolist() 47 | col_names = [make_tuple(x) for x in col_names] 48 | df['mat'].columns = col_names 49 | 50 | # convert numbers to string if necessary 51 | ################################################# 52 | if found_number['row']: 53 | row_names = df['mat'].index.tolist() 54 | row_names = [str(x) for x in row_names] 55 | df['mat'].index = row_names 56 | 57 | if found_number['col']: 58 | col_names = df['mat'].columns.tolist() 59 | col_names = [str(x) for x in col_names] 60 | df['mat'].columns = col_names 61 | 62 | return df -------------------------------------------------------------------------------- /clustergrammer/run_filter.py: -------------------------------------------------------------------------------- 1 | def df_filter_row_sum(df, threshold, take_abs=True): 2 | ''' filter rows in matrix at some threshold 3 | and remove columns that have a sum below this threshold ''' 4 | 5 | from copy import deepcopy 6 | from .__init__ import Network 7 | net = Network() 8 | 9 | if take_abs is True: 10 | df_copy = deepcopy(df['mat'].abs()) 11 | else: 12 | df_copy = deepcopy(df['mat']) 13 | 14 | ini_rows = df_copy.index.values.tolist() 15 | df_copy = df_copy.transpose() 16 | tmp_sum = df_copy.sum(axis=0) 17 | tmp_sum = tmp_sum.abs() 18 | tmp_sum.sort_values(inplace=True, ascending=False) 19 | 20 | tmp_sum = tmp_sum[tmp_sum > threshold] 21 | keep_rows = sorted(tmp_sum.index.values.tolist()) 22 | 23 | if len(keep_rows) < len(ini_rows): 24 | df['mat'] = grab_df_subset(df['mat'], keep_rows=keep_rows) 25 | 26 | if 'mat_up' in df: 27 | df['mat_up'] = grab_df_subset(df['mat_up'], keep_rows=keep_rows) 28 | df['mat_dn'] = grab_df_subset(df['mat_dn'], keep_rows=keep_rows) 29 | 30 | if 'mat_orig' in df: 31 | df['mat_orig'] = grab_df_subset(df['mat_orig'], keep_rows=keep_rows) 32 | 33 | return df 34 | 35 | def df_filter_col_sum(df, threshold, take_abs=True): 36 | ''' filter columns in matrix at some threshold 37 | and remove rows that have all zero values ''' 38 | 39 | from copy import deepcopy 40 | from .__init__ import Network 41 | net = Network() 42 | 43 | if take_abs is True: 44 | df_copy = deepcopy(df['mat'].abs()) 45 | else: 46 | df_copy = deepcopy(df['mat']) 47 | 48 | df_copy = df_copy.transpose() 49 | df_copy = df_copy[df_copy.sum(axis=1) > threshold] 50 | df_copy = df_copy.transpose() 51 | df_copy = df_copy[df_copy.sum(axis=1) > 0] 52 | 53 | if take_abs is True: 54 | inst_rows = df_copy.index.tolist() 55 | inst_cols = df_copy.columns.tolist() 56 | df['mat'] = grab_df_subset(df['mat'], inst_rows, inst_cols) 57 | 58 | if 'mat_up' in df: 59 | df['mat_up'] = grab_df_subset(df['mat_up'], inst_rows, inst_cols) 60 | df['mat_dn'] = grab_df_subset(df['mat_dn'], inst_rows, inst_cols) 61 | 62 | if 'mat_orig' in df: 63 | df['mat_orig'] = grab_df_subset(df['mat_orig'], inst_rows, inst_cols) 64 | 65 | else: 66 | df['mat'] = df_copy 67 | 68 | return df 69 | 70 | def grab_df_subset(df, keep_rows='all', keep_cols='all'): 71 | if keep_cols != 'all': 72 | df = df[keep_cols] 73 | if keep_rows != 'all': 74 | df = df.ix[keep_rows] 75 | return df 76 | 77 | def get_sorted_rows(df, rank_type='sum'): 78 | from copy import deepcopy 79 | 80 | inst_df = deepcopy(df) 81 | inst_df = inst_df.transpose() 82 | 83 | if rank_type == 'sum': 84 | tmp_sum = inst_df.sum(axis=0) 85 | elif rank_type == 'var': 86 | tmp_sum = inst_df.var(axis=0) 87 | 88 | tmp_sum = tmp_sum.abs() 89 | tmp_sum.sort_values(inplace=True, ascending=False) 90 | rows_sorted = tmp_sum.index.values.tolist() 91 | 92 | return rows_sorted 93 | 94 | def filter_N_top(inst_rc, df, N_top, rank_type='sum'): 95 | 96 | if inst_rc == 'col': 97 | for inst_type in df: 98 | df[inst_type] = df[inst_type].transpose() 99 | 100 | rows_sorted = get_sorted_rows(df['mat'], rank_type) 101 | 102 | keep_rows = rows_sorted[:N_top] 103 | 104 | df['mat'] = df['mat'].ix[keep_rows] 105 | if 'mat_up' in df: 106 | df['mat_up'] = df['mat_up'].ix[keep_rows] 107 | df['mat_dn'] = df['mat_dn'].ix[keep_rows] 108 | 109 | if 'mat_orig' in df: 110 | df['mat_orig'] = df['mat_orig'].ix[keep_rows] 111 | 112 | if inst_rc == 'col': 113 | for inst_type in df: 114 | df[inst_type] = df[inst_type].transpose() 115 | 116 | return df 117 | 118 | def filter_threshold(df, inst_rc, threshold, num_occur=1): 119 | ''' 120 | Filter a network's rows or cols based on num_occur values being above a 121 | threshold (in absolute_value) 122 | ''' 123 | from copy import deepcopy 124 | 125 | inst_df = deepcopy(df['mat']) 126 | 127 | if inst_rc == 'col': 128 | inst_df = inst_df.transpose() 129 | 130 | inst_df = inst_df.abs() 131 | 132 | ini_rows = inst_df.index.values.tolist() 133 | 134 | inst_df[inst_df < threshold] = 0 135 | inst_df[inst_df >= threshold] = 1 136 | 137 | tmp_sum = inst_df.sum(axis=1) 138 | 139 | tmp_sum = tmp_sum[tmp_sum >= num_occur] 140 | 141 | keep_names = tmp_sum.index.values.tolist() 142 | 143 | if inst_rc == 'row': 144 | if len(keep_names) < len(ini_rows): 145 | df['mat'] = grab_df_subset(df['mat'], keep_rows=keep_names) 146 | 147 | if 'mat_up' in df: 148 | df['mat_up'] = grab_df_subset(df['mat_up'], keep_rows=keep_names) 149 | df['mat_dn'] = grab_df_subset(df['mat_dn'], keep_rows=keep_names) 150 | 151 | if 'mat_orig' in df: 152 | df['mat_orig'] = grab_df_subset(df['mat_orig'], keep_rows=keep_names) 153 | 154 | elif inst_rc == 'col': 155 | inst_df = inst_df.transpose() 156 | 157 | inst_rows = inst_df.index.values.tolist() 158 | inst_cols = keep_names 159 | 160 | df['mat'] = grab_df_subset(df['mat'], inst_rows, inst_cols) 161 | 162 | if 'mat_up' in df: 163 | df['mat_up'] = grab_df_subset(df['mat_up'], inst_rows, inst_cols) 164 | df['mat_dn'] = grab_df_subset(df['mat_dn'], inst_rows, inst_cols) 165 | 166 | if 'mat_orig' in df: 167 | df['mat_orig'] = grab_df_subset(df['mat_orig'], inst_rows, inst_cols) 168 | 169 | return df 170 | 171 | def filter_cat(net, axis, cat_index, cat_name): 172 | 173 | try: 174 | df = net.export_df() 175 | 176 | # DataFrame filtering will be run always be run on columns if the user 177 | # wants to filter rows, transpose the matrix before and after 178 | if axis == 'row': 179 | df = df.transpose() 180 | 181 | all_names = df.columns.tolist() 182 | 183 | found_names = [i for i in all_names if i[cat_index] == cat_name] 184 | 185 | if len(found_names) > 0: 186 | df = df[found_names] 187 | 188 | if axis == 'row': 189 | df = df.transpose() 190 | else: 191 | print('no ' + axis + 's were found with this category and filtering was not run') 192 | 193 | net.load_df(df) 194 | 195 | except: 196 | print('category filtering did not run\n check that your category filtering is set up correctly') 197 | 198 | 199 | def filter_names(net, axis, names): 200 | 201 | print('filter_names') 202 | print(names) 203 | 204 | try: 205 | 206 | df = net.export_df() 207 | 208 | # Dataframe filtering will always be run on the columns. If the user wants to filter rows, then it will transpose back and forth. 209 | 210 | if axis == 'row': 211 | df = df.transpose() 212 | 213 | all_names = df.columns.tolist() 214 | 215 | found_names = [] 216 | for inst_name in all_names: 217 | 218 | if type(inst_name) is tuple: 219 | check_name = inst_name[0] 220 | else: 221 | check_name = inst_name 222 | 223 | if ': ' in check_name: 224 | check_name = check_name.split(': ')[1] 225 | 226 | if check_name in names: 227 | found_names.append(inst_name) 228 | 229 | if len(found_names) > 0: 230 | df = df[found_names] 231 | 232 | if axis == 'row': 233 | df = df.transpose() 234 | 235 | net.load_df(df) 236 | 237 | else: 238 | print('no ' + axis + 's were found with these names') 239 | 240 | except: 241 | print('error in filtering names') 242 | 243 | print(found_names) -------------------------------------------------------------------------------- /make_clustergrammer.py: -------------------------------------------------------------------------------- 1 | ''' 2 | The clustergrammer python module can be installed using pip: 3 | pip install clustergrammer 4 | 5 | or by getting the code from the repo: 6 | https://github.com/MaayanLab/clustergrammer-py 7 | ''' 8 | 9 | # from clustergrammer import Network 10 | from clustergrammer import Network 11 | net = Network() 12 | 13 | # load matrix tsv file 14 | net.load_file('txt/rc_two_cats.txt') 15 | # net.load_file('txt/rc_val_cats.txt') 16 | 17 | # optional filtering and normalization 18 | ########################################## 19 | # net.filter_sum('row', threshold=20) 20 | # net.normalize(axis='col', norm_type='zscore', keep_orig=True) 21 | # net.filter_N_top('row', 250, rank_type='sum') 22 | # net.filter_threshold('row', threshold=3.0, num_occur=4) 23 | # net.swap_nan_for_zero() 24 | # net.downsample(ds_type='kmeans', axis='col', num_samples=10) 25 | # net.random_sample(random_state=100, num_samples=10, axis='col') 26 | # net.clip(-6,6) 27 | # net.filter_cat('row', 1, 'Gene Type: Interesting') 28 | # net.set_cat_color('col', 1, 'Category: one', 'blue') 29 | 30 | net.cluster(dist_type='cos',views=['N_row_sum', 'N_row_var'] , dendro=True, 31 | sim_mat=True, filter_sim=0.1, calc_cat_pval=False, enrichrgram=True) 32 | 33 | # write jsons for front-end visualizations 34 | net.write_json_to_file('viz', 'json/mult_view.json', 'no-indent') 35 | net.write_json_to_file('sim_row', 'json/mult_view_sim_row.json', 'no-indent') 36 | net.write_json_to_file('sim_col', 'json/mult_view_sim_col.json', 'no-indent') 37 | -------------------------------------------------------------------------------- /make_stdin_stdout.py: -------------------------------------------------------------------------------- 1 | ''' 2 | The clustergrammer python module can be installed using pip: 3 | pip install clustergrammer 4 | 5 | or by getting the code from the repo: 6 | https://github.com/MaayanLab/clustergrammer-py 7 | ''' 8 | 9 | # from clustergrammer import Network 10 | from clustergrammer import Network 11 | net = Network() 12 | 13 | # load matrix tsv file 14 | net.load_stdin() 15 | 16 | # optional filtering and normalization 17 | ########################################## 18 | # net.filter_sum('row', threshold=20) 19 | # net.normalize(axis='col', norm_type='zscore', keep_orig=True) 20 | # net.filter_N_top('row', 250, rank_type='sum') 21 | # net.filter_threshold('row', threshold=3.0, num_occur=4) 22 | # net.swap_nan_for_zero() 23 | 24 | net.make_clust(dist_type='cos',views=['N_row_sum', 'N_row_var'] , dendro=True, 25 | sim_mat=True, filter_sim=0.1, calc_cat_pval=False) 26 | 27 | # output jsons for front-end visualizations 28 | print(net.export_net_json('viz', 'no-indent')) -------------------------------------------------------------------------------- /python27 new import.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "\n", 14 | "# import clustergrammer_widget\n", 15 | "from clustergrammer_widget import *\n", 16 | "\n", 17 | "# use local clustergrammer\n", 18 | "from clustergrammer import Network" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/plain": [ 29 | "'\\n version 1.12.4\\n\\n Clustergrammer.py takes a matrix as input (either from a file of a Pandas DataFrame), normalizes/filters, hierarchically clusters, and produces the :ref:`visualization_json` for :ref:`clustergrammer_js`.\\n\\n Networks have two states:\\n\\n 1. the data state, where they are stored as a matrix and nodes\\n 2. the viz state where they are stored as viz.links, viz.row_nodes, and viz.col_nodes.\\n\\n The goal is to start in a data-state and produce a viz-state of\\n the network that will be used as input to clustergram.js.\\n '" 30 | ] 31 | }, 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "output_type": "execute_result" 35 | } 36 | ], 37 | "source": [ 38 | "net = Network(clustergrammer_widget)\n", 39 | "net.__doc__" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 10, 45 | "metadata": { 46 | "collapsed": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "# generate random matrix\n", 51 | "num_rows = 500\n", 52 | "num_cols = 10\n", 53 | "np.random.seed(seed=100)\n", 54 | "mat = np.random.rand(num_rows, num_cols)\n", 55 | "\n", 56 | "# make row and col labels\n", 57 | "rows = range(num_rows)\n", 58 | "cols = range(num_cols)\n", 59 | "rows = [str(i) for i in rows]\n", 60 | "cols = [str(i) for i in cols]\n", 61 | "\n", 62 | "# make dataframe \n", 63 | "df = pd.DataFrame(data=mat, columns=cols, index=rows)" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 11, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "application/vnd.jupyter.widget-view+json": { 74 | "model_id": "dfd54aa4aff347d2be76a753c0fdee93" 75 | } 76 | }, 77 | "metadata": {}, 78 | "output_type": "display_data" 79 | } 80 | ], 81 | "source": [ 82 | "net.load_df(df)\n", 83 | "net.cluster()\n", 84 | "net.widget()" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 12, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "application/vnd.jupyter.widget-view+json": { 95 | "model_id": "7b8846afd6cd4aa49556935cc23350e9" 96 | } 97 | }, 98 | "metadata": {}, 99 | "output_type": "display_data" 100 | } 101 | ], 102 | "source": [ 103 | "net.load_file('txt/rc_two_cats.txt')\n", 104 | "net.cluster()\n", 105 | "net.widget()" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": { 112 | "collapsed": true 113 | }, 114 | "outputs": [], 115 | "source": [] 116 | } 117 | ], 118 | "metadata": { 119 | "anaconda-cloud": {}, 120 | "kernelspec": { 121 | "display_name": "Python [Root]", 122 | "language": "python", 123 | "name": "Python [Root]" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 2 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython2", 135 | "version": "2.7.12" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /python35_new_import.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "\n", 14 | "# import clustergrammer_widget\n", 15 | "from clustergrammer_widget import *\n", 16 | "\n", 17 | "# use local clustergrammer\n", 18 | "from clustergrammer import Network" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/plain": [ 29 | "'\\n version 1.12.4\\n\\n Clustergrammer.py takes a matrix as input (either from a file of a Pandas DataFrame), normalizes/filters, hierarchically clusters, and produces the :ref:`visualization_json` for :ref:`clustergrammer_js`.\\n\\n Networks have two states:\\n\\n 1. the data state, where they are stored as a matrix and nodes\\n 2. the viz state where they are stored as viz.links, viz.row_nodes, and viz.col_nodes.\\n\\n The goal is to start in a data-state and produce a viz-state of\\n the network that will be used as input to clustergram.js.\\n '" 30 | ] 31 | }, 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "output_type": "execute_result" 35 | } 36 | ], 37 | "source": [ 38 | "net = Network(clustergrammer_widget)\n", 39 | "net.__doc__" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 6, 45 | "metadata": { 46 | "collapsed": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "# generate random matrix\n", 51 | "num_rows = 500\n", 52 | "num_cols = 10\n", 53 | "np.random.seed(seed=100)\n", 54 | "mat = np.random.rand(num_rows, num_cols)\n", 55 | "\n", 56 | "# make row and col labels\n", 57 | "rows = range(num_rows)\n", 58 | "cols = range(num_cols)\n", 59 | "rows = [str(i) for i in rows]\n", 60 | "cols = [str(i) for i in cols]\n", 61 | "\n", 62 | "# make dataframe \n", 63 | "df = pd.DataFrame(data=mat, columns=cols, index=rows)" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 7, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "application/vnd.jupyter.widget-view+json": { 74 | "model_id": "e7f1dc60de214594b83af2b5c77284c3" 75 | } 76 | }, 77 | "metadata": {}, 78 | "output_type": "display_data" 79 | } 80 | ], 81 | "source": [ 82 | "net.load_df(df)\n", 83 | "net.cluster()\n", 84 | "net.widget()" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 9, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "application/vnd.jupyter.widget-view+json": { 95 | "model_id": "0f403b0b1b604879bcd3154dd8020c1b" 96 | } 97 | }, 98 | "metadata": {}, 99 | "output_type": "display_data" 100 | } 101 | ], 102 | "source": [ 103 | "net.load_file('txt/rc_two_cats.txt')\n", 104 | "net.cluster()\n", 105 | "net.widget()" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": { 112 | "collapsed": true 113 | }, 114 | "outputs": [], 115 | "source": [] 116 | } 117 | ], 118 | "metadata": { 119 | "anaconda-cloud": {}, 120 | "kernelspec": { 121 | "display_name": "Python [py35]", 122 | "language": "python", 123 | "name": "Python [py35]" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 3 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython3", 135 | "version": "3.5.2" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | setup( 3 | name = 'clustergrammer', 4 | packages = ['clustergrammer'], # this must be the same as the name above 5 | version = '1.13.6', 6 | description = 'A python module for the Clustergrammer visualization project', 7 | author = 'Nicolas Fernandez', 8 | author_email = 'nickfloresfernandez@gmail.com', 9 | url = 'https://github.com/MaayanLab/clustergrammer-py', 10 | download_url = 'https://github.com/MaayanLab/clustergrammer-py/tarball/1.1.2', 11 | keywords = ['testing'], 12 | classifiers = [], 13 | ) -------------------------------------------------------------------------------- /txt/example_tsv.txt: -------------------------------------------------------------------------------- 1 | Col-1 Col-2 Col-3 Col-4 Col-5 Col-6 Col-7 Col-8 Col-9 Col-10 Col-11 Col-12 Col-13 Col-14 Col-15 Col-16 Col-17 Col-18 Col-19 Col-20 Col-21 Col-22 Col-23 Col-24 Col-25 Col-26 Col-27 Col-28 Col-29 2 | CDK4 -0.792803571 0.527687127 0.000622536 0.356722594 0.933286088 -0.131728538 0.808451944 4.240884801 -0.540231391 -0.981456952 -0.84689892 -0.252795921 0.114189581 -0.06649884 0.149218809 1.351263924 0.645867212 0.60561098 3.232454573 0.342634572 -0.430912324 -0.40590567 0.199563989 -1.122536294 2.210334571 0.405126315 -0.089763159 0.405126315 0.340012773 3 | LMTK3 0.17762054 -0.016061489 5.422113833 1.307039675 0.355814985 0.276904994 0.483153915 -0.240495821 1.336445996 1.149618502 0.361412978 -0.380518938 -0.213541004 -0.471938639 -0.620858723 -0.163637058 -0.487256142 -0.029569688 -0.232057778 -0.669036939 -0.449241698 1.158930406 0.511962022 2.370834155 0.262893885 -0.513128895 -0.501210068 0.439277561 -0.342460508 4 | LRRK2 -0.697876151 -0.555610265 -0.360497559 -0.460236731 -0.680760697 -0.169463518 1.715708875 -0.517104823 0.184987709 0.8106597 -0.440334448 -0.621052026 -0.086803358 -0.753966225 -0.401972037 -0.562086752 -0.560644597 0.542301381 -0.382639145 -0.377853523 -0.713472923 -0.377609368 4.308904581 -0.638131949 -0.556114063 -0.318145763 -0.489582714 1.677376527 -0.682790464 5 | UHMK1 0.850546518 -0.263279907 0.179253031 0.398646721 1.537663802 0.505291411 0.902366491 -0.16628803 0.630730564 0.399448283 0.847171367 -0.442268094 0.44368676 1.552969029 1.110283483 -0.326698072 -0.405267374 0.663747183 0.424470033 0.283221899 -4.243973921 0.718315578 1.747343933 -1.020927175 0.305028514 1.47174613 0.048902278 -0.255283556 0.548224573 6 | EGFR 1.412416216 0.018987506 0.902251622 -0.17813747 0.781819022 0.211815895 -0.023427175 3.557295952 1.173783556 -0.012362164 0.769782484 -0.681031743 -1.047375389 0.652065499 0.172316691 2.072433469 1.135709377 -0.169977181 0.881067136 -0.486159025 -1.451838026 0.371237737 -0.581665325 -0.126356157 0.241004724 1.06526919 0.974531796 0.668645091 0.05696489 7 | STK32A -0.388039665 -0.592626614 -0.24413651 0.740364734 3.023348415 -0.433985412 -0.630124457 1.156531983 0.433696213 3.84950782 -0.225425742 -0.656106808 -0.311953357 -0.397450226 1.044025538 -0.247816912 3.640524345 -0.59251039 0.514666245 -0.45396994 1.649737631 3.366020313 -0.430502237 -0.295312303 2.824551497 -0.014275115 -0.410477794 -0.229717784 3.709828616 8 | NRK 1.408537135 -0.017369325 -0.367127962 0.313253548 -0.16288686 0.027411933 -0.281351556 5.813846489 -0.161706584 0.472386752 -0.33979584 0.669956625 -0.2596391 -0.386601295 -0.293654593 4.390499721 -0.420942214 -0.402955154 -0.346809494 -0.222725132 0.36849943 1.49303248 -0.34174718 -0.343420451 5.284808607 -0.358156896 -0.222931558 -0.401391167 -0.412478715 9 | ERBB2 0.906642406 -0.684771423 0.015261254 0.16056792 0.365002113 -0.564392699 0.169072827 -0.035192496 -0.031210405 0.447742443 0.544075103 0.280008477 -0.066278222 -0.225814318 4.103496507 1.219691566 -0.245022001 -0.681552658 -0.304817333 -0.511212295 -1.100056017 1.335983295 -0.500561544 0.721259819 0.284747072 0.232812724 -0.796930101 -0.156381455 1.503853721 10 | ERBB4 -0.452907052 -0.392790536 -0.374173515 -0.527418493 -0.320103334 -0.560657219 -0.312847509 -0.463903623 -0.304652329 -0.30897114 -0.331935876 4.098930821 -0.413942149 -0.501418917 1.256164876 -0.12356019 -0.425577927 -0.36998588 -0.054684881 -0.484730631 -0.419739472 -0.432412432 0.143245619 -0.266932489 -0.340860307 -0.231847291 -0.448292539 -0.42868169 -0.615936889 11 | AAK1 3.579051735 0.92330807 -0.651094367 0.952743833 -0.212733397 0.006074527 -0.121038246 0.083769063 -0.722678214 1.669410989 -0.247600883 -0.284623649 -0.687716717 -0.320883885 -0.93370415 -0.309230053 0.544870152 0.824029397 -0.087291924 -0.973905867 -0.308282983 0.822145704 -0.72904308 -0.088865731 -0.29848499 -0.451367112 -1.134040733 0.379230443 1.491612577 12 | SRPK3 -0.582761335 -0.706379425 0.364313301 -0.483011414 -0.71307719 -0.048548064 -0.527944549 0.337501769 -0.656781635 -0.323318624 -0.432950623 -0.414799098 -1.02570516 -0.861415433 0.113447131 -0.110117735 -0.493510825 0.148841502 -0.341096914 -0.373760525 5.16013802 -0.204906932 -0.465574863 -0.170405491 0.046286803 -0.100887639 0.936150906 -0.15980844 -0.846857677 13 | STK39 -0.58688791 -0.186685902 -3.51852921 0.250628834 -0.477773537 -0.62381107 -0.92202388 -0.55383453 0.018847867 1.267644557 0.243732055 -0.233528273 0.070726356 -0.256360198 1.741001607 0.168247379 -0.245974299 0.014972759 -0.537623787 0.259364957 1.303190492 1.043208024 -1.021094946 -0.097444212 -0.679290593 0.132592576 -0.440607517 -0.21005684 0.651311995 14 | GRK4 -0.693639785 -0.357559299 -0.903861262 -0.810450279 0.293775461 1.012469252 -0.1044623 -0.573757161 -0.629467998 -1.131138938 -0.401984075 -0.672554564 -1.182791974 -1.00138041 -1.020216093 -0.437799213 -0.103783602 -0.387565928 0.386471772 -0.524742175 3.627070248 -0.846550806 -0.473736661 0.443388919 0.766135257 0.193683015 -0.614757268 -0.382906171 -0.864410411 15 | TBK1 0.327203594 0.857319301 -1.397356596 -0.226683585 -0.986051455 0.438343505 0.095527129 5.598772308 0.535025797 -0.057479225 -0.089086932 -0.473291011 2.070252907 0.201409888 1.134728133 0.095734104 0.22916067 0.566649702 1.011889663 0.902342556 0.735510304 -0.493538517 3.176918602 0.490045737 -0.693495689 -0.183704878 -0.182356844 0.744728769 -0.311064392 16 | INSRR 0.331108191 -0.467978397 0.681112329 -1.195914121 -0.538461957 3.616542204 0.094919881 0.527357553 -0.160478312 -0.940444939 -1.025689676 0.053722044 0.081275611 -0.616337936 -0.48042057 0.620903382 -0.723793064 -0.759642991 -0.744900744 -0.43363819 -0.804929849 0.429022269 0.765012989 0.400356525 0.207741849 1.474373008 0.888734282 0.387715581 -0.623678298 17 | IRAK1 0.141183837 0.788608352 0.51421388 0.528255597 0.906234597 0.050158065 -0.843341382 1.44384296 -0.253699343 0.562537497 -0.505923659 -6.621630156 -1.253907087 0.947883556 0.394657662 0.613122698 -0.223205762 0.905373736 0.679301554 0.859789222 -0.022354553 1.046726052 0.194471 0.359912834 -0.806348234 1.586619876 0.311985824 0.544470027 1.479598537 18 | KDR -0.524309949 -0.285994749 -0.484871681 0.176655189 -0.139711627 -0.352978397 -0.192529854 -0.601257973 -0.427323427 0.459403791 -0.383001 -0.571316753 -0.387982572 -0.353945118 -0.24031472 -0.305685522 0.456864915 -0.134886653 2.136583182 -0.463127859 -0.600408393 4.430955918 -0.374772774 -0.206853106 1.756104521 0.832887289 -0.314392176 -0.31373742 0.493420283 19 | NPR1 0.509592174 0.464774315 0.275495704 -0.01882253 -0.005537792 -0.457197866 3.408253083 -0.430407643 -0.754186082 -0.836530855 -0.277053957 -0.54257843 0.850439577 -0.298981449 -0.169394809 0.254783369 -0.427821755 -0.1389465 0.618069135 1.926349897 -0.305399918 -0.535939215 -0.078661666 6.267854465 -0.57293844 -0.302168609 0.72307512 0.611863003 -0.2145995 20 | PAK3 -0.554447111 -0.145753485 0.019807701 -0.634915727 -0.493766887 -0.587644968 -0.223714996 1.385049476 -0.346100755 0.254536609 0.057887318 -0.593081366 -0.47065185 -0.753944095 -0.044570503 -0.334597636 0.339587788 -0.135201173 -0.111896921 4.913848539 -0.542750046 -0.27783299 -0.689097582 -0.216688006 1.127699857 0.03588747 -0.070416156 -0.553268062 -0.977427689 21 | PDGFRA -0.530831743 -0.260873607 -0.461134617 -0.389056188 -0.512555424 -0.513202268 -0.337550397 -0.449953768 -0.184845421 -0.565880657 -0.376356202 -0.285266104 -0.570505879 3.598950655 -0.477052971 -0.364449691 -0.648917127 -0.390363086 1.117877995 -0.464086586 -0.456431016 -0.571010056 -0.456668794 -0.58230495 -0.426192704 -0.411161613 -0.455618608 -0.297498019 -0.47141323 22 | PDK4 -0.643246331 0.052021433 -0.735006626 0.041068843 -0.062094125 5.477714716 1.256686967 -0.136401851 0.577266871 -0.60002565 0.087671916 -0.560959779 -0.56490393 -0.629261602 -0.214226487 0.09929963 -0.095715004 -0.632856345 1.09320354 0.386976419 -0.374720076 -0.564957229 1.680895489 0.508498891 0.916604247 -0.607497709 0.58989289 -0.122764837 -0.533651064 23 | ULK4 -0.693868027 0.57619653 -0.488541037 -0.60094858 1.139598497 -0.024286993 -0.288194559 -0.499250839 0.103697413 -1.044716157 -1.02427507 3.023003444 3.859028873 0.742978442 -0.352722819 0.069205383 -1.117741919 0.651553716 -0.364768511 -0.5472691 -0.735728719 -0.623797671 -0.105370633 -0.139431549 -0.055372648 0.774449831 0.538987341 0.35514519 -1.45386407 24 | PRKCE 0.006531886 0.564826732 3.695318524 0.316255033 -0.268737774 0.936461505 0.2291517 0.649579007 -0.330103595 -0.504534505 0.264729237 -0.977228043 0.493632169 -0.401821398 -0.286231779 0.143371278 -0.360231532 0.340762717 0.633162512 -0.710530502 -1.334690191 0.158108045 -0.347820435 -0.074497061 -0.970507716 -0.26479443 -0.298648517 -0.10090872 -0.11742112 25 | PRKG2 -0.185695405 -0.173758799 0.084357105 1.826502656 0.00816719 -1.102148634 0.299002536 0.458848186 0.292508806 0.110508201 0.083592283 -0.494333063 -0.117947546 -0.539712481 -0.106334279 -0.403083002 -0.789473381 1.041787363 1.70041072 -0.293951867 4.839524758 1.015480815 0.841188534 -0.620389764 -0.565583764 -0.262366184 0.226425315 -0.048000565 1.126249373 26 | MAPK4 0.184462349 -0.526037871 0.432087272 -0.882311913 0.246356093 0.858754521 0.052858019 -1.118340603 -0.846948816 -0.778824075 3.525192777 -1.872745007 -0.779756435 -1.039639399 -0.59333431 0.402156007 -1.387426464 -0.145435051 -0.46497243 -0.221064461 -0.861483648 0.125415634 -0.191849116 2.374460297 -0.74142144 0.7654394 1.029796862 0.03307866 0.44066582 27 | MAPK11 1.760301448 -0.912259652 -1.163345889 -0.965891664 -0.795153414 -0.616300339 -1.360743997 -1.448291877 -0.024088935 -1.188868793 -0.229906845 2.181489143 -1.154435684 6.28292787 -0.303782002 -0.165568925 -1.126153349 1.678721355 -1.683560793 -0.864063548 -0.025445472 1.890946219 0.667805988 -0.625764381 -1.063340313 3.222816803 -0.001359619 -0.203661756 0.187669924 28 | STK31 -0.07364355 -0.103789279 -0.171304836 0.351910065 0.63677969 -0.136732984 0.356830815 3.889115824 0.645442526 1.366358918 0.995319244 5.608685402 1.101919141 -0.554900568 0.087820649 0.061305127 1.931275557 -0.692417574 -0.481807702 -0.16288735 -0.538298189 2.440412245 0.804274605 -0.605526195 1.788457016 -0.376520922 0.35819202 0.164487781 3.719306763 29 | GRK1 -0.751526741 0.49762292 -0.142534658 -0.882124083 -1.151282849 2.307907188 -0.12032085 -0.351269532 -1.526178564 -0.753268428 3.600861739 -1.223995853 -0.607229424 -0.027417898 0.190161632 0.610550408 0.149796331 -0.122879865 0.247865963 -0.404833708 0.736929754 -0.944275068 -0.078919294 0.661648005 -0.244948779 3.051534602 -0.107365228 0.367536408 -1.517985824 30 | ROS1 -0.31236414 0.701257089 0.47520812 -0.585297054 -0.122694283 -0.866875137 0.367939523 -0.481103706 2.072237711 10.29186436 1.298805701 -0.628175917 -0.173084375 -0.02710755 0.355169073 0.470456905 0.121400231 0.374924602 -0.278307341 -0.553746266 -0.935156558 -0.042420296 -0.479479902 -0.332400886 -0.710017011 1.873931755 0.204554429 -0.32315246 0.187572521 31 | MAP2K4 0.11931136 0.593670684 0.489152771 0.841683345 1.064673748 0.095113499 1.050152022 1.891488427 -5.5283552 0.64306832 -1.100026181 0.765710935 1.165406655 0.30638633 -1.365894262 0.635492291 -0.377798616 0.521665309 -0.608497433 0.398484128 -0.988354968 1.36349214 1.36269783 -0.112291585 -0.262719995 0.503524059 0.498006014 1.525942005 0.339189212 32 | SRC -0.294263824 -0.618071649 -0.252534114 -0.78660676 -0.228026664 0.977860794 -1.200449832 -0.22037931 -0.240489906 -0.201675468 1.47598938 -0.557000568 -0.502553204 -0.437501309 0.966927023 0.379670097 0.048795579 0.250622869 2.961024714 2.299033235 -1.210659274 0.418655141 1.161954005 -0.15700654 -1.254142937 -0.574558055 -0.662438275 3.702617515 -0.35302723 33 | TGFBR1 -0.000863802 0.735638383 -0.680289747 0.040925843 0.359330228 -1.587400295 -1.041686081 0.071551408 -0.168322665 -1.377303308 3.604539089 -0.004601068 1.527568732 -0.300154707 -0.786135509 -0.138050924 -0.366480418 -0.796970206 -0.030155544 0.803100056 0.683145561 -0.900708154 0.15251077 0.140092011 0.376815421 -1.214319621 1.326197465 1.523070279 -1.312001824 34 | CAMK2B -0.276736819 -0.426080887 -0.160160461 -0.890032771 -0.437405434 0.143897214 -0.573425958 -0.486419381 -0.536963482 -0.657041002 -0.473345418 -0.237475279 -0.669396538 -0.559435302 0.038953301 0.033709721 -0.343587801 -0.513218087 -0.592303313 -0.431221835 5.339202897 -0.493778587 -0.645000361 -0.477984867 -0.401579746 -0.621782124 -0.249394627 -0.303365249 -0.922343302 35 | STK24 -0.31807579 -0.814110809 0.646545188 0.26837169 -9.425120961 -1.073853473 -2.049589626 -0.346921024 0.997283181 0.300619253 -0.543103864 -1.150792172 -2.283061167 -0.162802216 -1.053859713 -1.377541743 -0.288349474 -0.922266884 -1.123953091 -0.762953893 -0.687357148 0.28991073 0.317576672 -0.345565515 0.541683 0.009754099 0.73792006 -0.624271752 0.100532547 36 | DCLK3 -0.670177714 3.224533501 0.145509552 0.107432319 -1.120492739 0.288890539 1.549545918 -0.342665051 -0.017402855 -0.420002244 -0.361387453 -1.264272075 -0.794507765 -0.619944678 -0.338767802 -0.148529478 -1.078879645 0.130939014 -1.307815313 -1.818798474 3.683694337 0.920647357 -0.847056974 -0.343798498 -1.21552566 -0.853845334 -0.357215055 -0.043911541 -0.955847309 37 | LATS1 -0.695252888 4.299877134 -0.175587126 -0.061022137 -0.391646018 3.385451038 0.345114288 -0.505734993 -0.482953864 -0.081815586 -0.928486879 0.976209137 0.099021487 2.494690556 -1.088742779 0.437174751 -0.507169467 2.028724319 -0.507954247 0.143506281 -1.19702953 0.610379518 0.095879151 -0.663118727 0.50821984 -0.741815419 2.38531026 0.354750355 0.658437634 38 | NEK9 -0.337849025 -0.535265918 0.803160459 0.275911465 0.981343049 -0.748451144 -0.092431408 -0.326477104 -0.381243917 -0.575343824 -0.63351617 -0.380961411 -1.720616197 -0.85605361 -0.580950374 0.373293116 0.905490886 0.135705555 1.107780656 -0.545183144 0.475561701 0.016687596 -0.172178219 0.585186686 -0.40480014 -3.997318149 0.711029765 -0.470884061 0.354386296 39 | MYLK3 -0.368173217 0.209192446 0.266317555 -0.100656799 -0.336791718 -0.060827204 -0.199021599 -0.765882671 -0.071476548 -0.4402703 -0.3548684 3.468121376 5.853726714 -0.465135408 0.074434692 7.085199705 -0.399050575 -0.334999773 -0.623071147 -0.406230833 0.939058116 -0.269533885 0.117950503 0.1975473 -0.365407931 -0.056856473 0.001983212 0.081609959 -0.603299855 -------------------------------------------------------------------------------- /txt/rc_ptms.txt: -------------------------------------------------------------------------------- 1 | Cell Line: H1650 Cell Line: H23 Cell Line: CAL-12T Cell Line: H358 Cell Line: H1975 Cell Line: HCC15 Cell Line: H1355 Cell Line: HCC827 Cell Line: H2405 Cell Line: HCC78 Cell Line: H1666 Cell Line: H661 Cell Line: H838 Cell Line: H1703 Cell Line: CALU-3 Cell Line: H2342 Cell Line: H2228 Cell Line: H1299 Cell Line: H1792 Cell Line: H460 Cell Line: H2106 Cell Line: H441 Cell Line: H1944 Cell Line: H1437 Cell Line: H1734 Cell Line: LOU-NH91 Cell Line: HCC44 Cell Line: A549 Cell Line: H1781 2 | Category: two Category: two Category: two Category: one Category: two Category: two Category: three Category: one Category: five Category: five Category: four Category: five Category: five Category: five Category: four Category: four Category: one Category: three Category: three Category: three Category: four Category: one Category: three Category: four Category: one Category: five Category: four Category: four Category: one 3 | Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Female Gender: Male Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Male Gender: Female Gender: Female Gender: Female Gender: Male Gender: Female 4 | Gene: CDK4_ptm-info Gene Type: Interesting -0.792803571 0.527687127 0.000622536 0.356722594 0.933286088 -0.131728538 0.808451944 4.240884801 -0.540231391 -0.981456952 -0.84689892 -0.252795921 0.114189581 -0.06649884 0.149218809 1.351263924 0.645867212 0.60561098 3.232454573 0.342634572 -0.430912324 -0.40590567 0.199563989 -1.122536294 2.210334571 0.405126315 -0.089763159 0.405126315 0.340012773 5 | Gene: LMTK3_ptm-info Gene Type: Not Interesting 0.17762054 -0.016061489 5.422113833 1.307039675 0.355814985 0.276904994 0.483153915 -0.240495821 1.336445996 1.149618502 0.361412978 -0.380518938 -0.213541004 -0.471938639 -0.620858723 -0.163637058 -0.487256142 -0.029569688 -0.232057778 -0.669036939 -0.449241698 1.158930406 0.511962022 2.370834155 0.262893885 -0.513128895 -0.501210068 0.439277561 -0.342460508 6 | Gene: LRRK2_ptm-info Gene Type: Not Interesting -0.697876151 -0.555610265 -0.360497559 -0.460236731 -0.680760697 -0.169463518 1.715708875 -0.517104823 0.184987709 0.8106597 -0.440334448 -0.621052026 -0.086803358 -0.753966225 -0.401972037 -0.562086752 -0.560644597 0.542301381 -0.382639145 -0.377853523 -0.713472923 -0.377609368 4.308904581 -0.638131949 -0.556114063 -0.318145763 -0.489582714 1.677376527 -0.682790464 7 | Gene: UHMK1_ptm-info Gene Type: Not Interesting 0.850546518 -0.263279907 0.179253031 0.398646721 1.537663802 0.505291411 0.902366491 -0.16628803 0.630730564 0.399448283 0.847171367 -0.442268094 0.44368676 1.552969029 1.110283483 -0.326698072 -0.405267374 0.663747183 0.424470033 0.283221899 -4.243973921 0.718315578 1.747343933 -1.020927175 0.305028514 1.47174613 0.048902278 -0.255283556 0.548224573 8 | Gene: EGFR_ptm-info Gene Type: Interesting 1.412416216 0.018987506 0.902251622 -0.17813747 0.781819022 0.211815895 -0.023427175 3.557295952 1.173783556 -0.012362164 0.769782484 -0.681031743 -1.047375389 0.652065499 0.172316691 2.072433469 1.135709377 -0.169977181 0.881067136 -0.486159025 -1.451838026 0.371237737 -0.581665325 -0.126356157 0.241004724 1.06526919 0.974531796 0.668645091 0.05696489 9 | Gene: STK32A_ptm-info Gene Type: Interesting -0.388039665 -0.592626614 -0.24413651 0.740364734 3.023348415 -0.433985412 -0.630124457 1.156531983 0.433696213 3.84950782 -0.225425742 -0.656106808 -0.311953357 -0.397450226 1.044025538 -0.247816912 3.640524345 -0.59251039 0.514666245 -0.45396994 1.649737631 3.366020313 -0.430502237 -0.295312303 2.824551497 -0.014275115 -0.410477794 -0.229717784 3.709828616 10 | Gene: NRK_ptm-info Gene Type: Interesting 1.408537135 -0.017369325 -0.367127962 0.313253548 -0.16288686 0.027411933 -0.281351556 5.813846489 -0.161706584 0.472386752 -0.33979584 0.669956625 -0.2596391 -0.386601295 -0.293654593 4.390499721 -0.420942214 -0.402955154 -0.346809494 -0.222725132 0.36849943 1.49303248 -0.34174718 -0.343420451 5.284808607 -0.358156896 -0.222931558 -0.401391167 -0.412478715 11 | Gene: ERBB2_ptm-info Gene Type: Not Interesting 0.906642406 -0.684771423 0.015261254 0.16056792 0.365002113 -0.564392699 0.169072827 -0.035192496 -0.031210405 0.447742443 0.544075103 0.280008477 -0.066278222 -0.225814318 4.103496507 1.219691566 -0.245022001 -0.681552658 -0.304817333 -0.511212295 -1.100056017 1.335983295 -0.500561544 0.721259819 0.284747072 0.232812724 -0.796930101 -0.156381455 1.503853721 12 | Gene: ERBB4_ptm-info Gene Type: Not Interesting -0.452907052 -0.392790536 -0.374173515 -0.527418493 -0.320103334 -0.560657219 -0.312847509 -0.463903623 -0.304652329 -0.30897114 -0.331935876 4.098930821 -0.413942149 -0.501418917 1.256164876 -0.12356019 -0.425577927 -0.36998588 -0.054684881 -0.484730631 -0.419739472 -0.432412432 0.143245619 -0.266932489 -0.340860307 -0.231847291 -0.448292539 -0.42868169 -0.615936889 13 | Gene: AAK1_ptm-info Gene Type: Not Interesting 3.579051735 0.92330807 -0.651094367 0.952743833 -0.212733397 0.006074527 -0.121038246 0.083769063 -0.722678214 1.669410989 -0.247600883 -0.284623649 -0.687716717 -0.320883885 -0.93370415 -0.309230053 0.544870152 0.824029397 -0.087291924 -0.973905867 -0.308282983 0.822145704 -0.72904308 -0.088865731 -0.29848499 -0.451367112 -1.134040733 0.379230443 1.491612577 14 | Gene: SRPK3_ptm-info Gene Type: Not Interesting -0.582761335 -0.706379425 0.364313301 -0.483011414 -0.71307719 -0.048548064 -0.527944549 0.337501769 -0.656781635 -0.323318624 -0.432950623 -0.414799098 -1.02570516 -0.861415433 0.113447131 -0.110117735 -0.493510825 0.148841502 -0.341096914 -0.373760525 5.16013802 -0.204906932 -0.465574863 -0.170405491 0.046286803 -0.100887639 0.936150906 -0.15980844 -0.846857677 15 | Gene: STK39_ptm-info Gene Type: Interesting -0.58688791 -0.186685902 -3.51852921 0.250628834 -0.477773537 -0.62381107 -0.92202388 -0.55383453 0.018847867 1.267644557 0.243732055 -0.233528273 0.070726356 -0.256360198 1.741001607 0.168247379 -0.245974299 0.014972759 -0.537623787 0.259364957 1.303190492 1.043208024 -1.021094946 -0.097444212 -0.679290593 0.132592576 -0.440607517 -0.21005684 0.651311995 16 | Gene: GRK4_ptm-info Gene Type: Not Interesting -0.693639785 -0.357559299 -0.903861262 -0.810450279 0.293775461 1.012469252 -0.1044623 -0.573757161 -0.629467998 -1.131138938 -0.401984075 -0.672554564 -1.182791974 -1.00138041 -1.020216093 -0.437799213 -0.103783602 -0.387565928 0.386471772 -0.524742175 3.627070248 -0.846550806 -0.473736661 0.443388919 0.766135257 0.193683015 -0.614757268 -0.382906171 -0.864410411 17 | Gene: TBK1_ptm-info Gene Type: Not Interesting 0.327203594 0.857319301 -1.397356596 -0.226683585 -0.986051455 0.438343505 0.095527129 5.598772308 0.535025797 -0.057479225 -0.089086932 -0.473291011 2.070252907 0.201409888 1.134728133 0.095734104 0.22916067 0.566649702 1.011889663 0.902342556 0.735510304 -0.493538517 3.176918602 0.490045737 -0.693495689 -0.183704878 -0.182356844 0.744728769 -0.311064392 18 | Gene: INSRR_ptm-info Gene Type: Not Interesting 0.331108191 -0.467978397 0.681112329 -1.195914121 -0.538461957 3.616542204 0.094919881 0.527357553 -0.160478312 -0.940444939 -1.025689676 0.053722044 0.081275611 -0.616337936 -0.48042057 0.620903382 -0.723793064 -0.759642991 -0.744900744 -0.43363819 -0.804929849 0.429022269 0.765012989 0.400356525 0.207741849 1.474373008 0.888734282 0.387715581 -0.623678298 19 | Gene: IRAK1_ptm-info Gene Type: Interesting 0.141183837 0.788608352 0.51421388 0.528255597 0.906234597 0.050158065 -0.843341382 1.44384296 -0.253699343 0.562537497 -0.505923659 -6.621630156 -1.253907087 0.947883556 0.394657662 0.613122698 -0.223205762 0.905373736 0.679301554 0.859789222 -0.022354553 1.046726052 0.194471 0.359912834 -0.806348234 1.586619876 0.311985824 0.544470027 1.479598537 20 | Gene: KDR_ptm-info Gene Type: Not Interesting -0.524309949 -0.285994749 -0.484871681 0.176655189 -0.139711627 -0.352978397 -0.192529854 -0.601257973 -0.427323427 0.459403791 -0.383001 -0.571316753 -0.387982572 -0.353945118 -0.24031472 -0.305685522 0.456864915 -0.134886653 2.136583182 -0.463127859 -0.600408393 4.430955918 -0.374772774 -0.206853106 1.756104521 0.832887289 -0.314392176 -0.31373742 0.493420283 21 | Gene: NPR1_ptm-info Gene Type: Interesting 0.509592174 0.464774315 0.275495704 -0.01882253 -0.005537792 -0.457197866 3.408253083 -0.430407643 -0.754186082 -0.836530855 -0.277053957 -0.54257843 0.850439577 -0.298981449 -0.169394809 0.254783369 -0.427821755 -0.1389465 0.618069135 1.926349897 -0.305399918 -0.535939215 -0.078661666 6.267854465 -0.57293844 -0.302168609 0.72307512 0.611863003 -0.2145995 22 | Gene: PAK3_ptm-info Gene Type: Not Interesting -0.554447111 -0.145753485 0.019807701 -0.634915727 -0.493766887 -0.587644968 -0.223714996 1.385049476 -0.346100755 0.254536609 0.057887318 -0.593081366 -0.47065185 -0.753944095 -0.044570503 -0.334597636 0.339587788 -0.135201173 -0.111896921 4.913848539 -0.542750046 -0.27783299 -0.689097582 -0.216688006 1.127699857 0.03588747 -0.070416156 -0.553268062 -0.977427689 23 | Gene: PDGFRA_ptm-info Gene Type: Interesting -0.530831743 -0.260873607 -0.461134617 -0.389056188 -0.512555424 -0.513202268 -0.337550397 -0.449953768 -0.184845421 -0.565880657 -0.376356202 -0.285266104 -0.570505879 3.598950655 -0.477052971 -0.364449691 -0.648917127 -0.390363086 1.117877995 -0.464086586 -0.456431016 -0.571010056 -0.456668794 -0.58230495 -0.426192704 -0.411161613 -0.455618608 -0.297498019 -0.47141323 24 | Gene: PDK4_ptm-info Gene Type: Not Interesting -0.643246331 0.052021433 -0.735006626 0.041068843 -0.062094125 5.477714716 1.256686967 -0.136401851 0.577266871 -0.60002565 0.087671916 -0.560959779 -0.56490393 -0.629261602 -0.214226487 0.09929963 -0.095715004 -0.632856345 1.09320354 0.386976419 -0.374720076 -0.564957229 1.680895489 0.508498891 0.916604247 -0.607497709 0.58989289 -0.122764837 -0.533651064 25 | Gene: ULK4_ptm-info Gene Type: Interesting -0.693868027 0.57619653 -0.488541037 -0.60094858 1.139598497 -0.024286993 -0.288194559 -0.499250839 0.103697413 -1.044716157 -1.02427507 3.023003444 3.859028873 0.742978442 -0.352722819 0.069205383 -1.117741919 0.651553716 -0.364768511 -0.5472691 -0.735728719 -0.623797671 -0.105370633 -0.139431549 -0.055372648 0.774449831 0.538987341 0.35514519 -1.45386407 26 | Gene: PRKCE_ptm-info Gene Type: Not Interesting 0.006531886 0.564826732 3.695318524 0.316255033 -0.268737774 0.936461505 0.2291517 0.649579007 -0.330103595 -0.504534505 0.264729237 -0.977228043 0.493632169 -0.401821398 -0.286231779 0.143371278 -0.360231532 0.340762717 0.633162512 -0.710530502 -1.334690191 0.158108045 -0.347820435 -0.074497061 -0.970507716 -0.26479443 -0.298648517 -0.10090872 -0.11742112 27 | Gene: PRKG2_ptm-info Gene Type: Not Interesting -0.185695405 -0.173758799 0.084357105 1.826502656 0.00816719 -1.102148634 0.299002536 0.458848186 0.292508806 0.110508201 0.083592283 -0.494333063 -0.117947546 -0.539712481 -0.106334279 -0.403083002 -0.789473381 1.041787363 1.70041072 -0.293951867 4.839524758 1.015480815 0.841188534 -0.620389764 -0.565583764 -0.262366184 0.226425315 -0.048000565 1.126249373 28 | Gene: MAPK4_ptm-info Gene Type: Interesting 0.184462349 -0.526037871 0.432087272 -0.882311913 0.246356093 0.858754521 0.052858019 -1.118340603 -0.846948816 -0.778824075 3.525192777 -1.872745007 -0.779756435 -1.039639399 -0.59333431 0.402156007 -1.387426464 -0.145435051 -0.46497243 -0.221064461 -0.861483648 0.125415634 -0.191849116 2.374460297 -0.74142144 0.7654394 1.029796862 0.03307866 0.44066582 29 | Gene: MAPK11_ptm-info Gene Type: Interesting 1.760301448 -0.912259652 -1.163345889 -0.965891664 -0.795153414 -0.616300339 -1.360743997 -1.448291877 -0.024088935 -1.188868793 -0.229906845 2.181489143 -1.154435684 6.28292787 -0.303782002 -0.165568925 -1.126153349 1.678721355 -1.683560793 -0.864063548 -0.025445472 1.890946219 0.667805988 -0.625764381 -1.063340313 3.222816803 -0.001359619 -0.203661756 0.187669924 30 | Gene: STK31_ptm-info Gene Type: Interesting -0.07364355 -0.103789279 -0.171304836 0.351910065 0.63677969 -0.136732984 0.356830815 3.889115824 0.645442526 1.366358918 0.995319244 5.608685402 1.101919141 -0.554900568 0.087820649 0.061305127 1.931275557 -0.692417574 -0.481807702 -0.16288735 -0.538298189 2.440412245 0.804274605 -0.605526195 1.788457016 -0.376520922 0.35819202 0.164487781 3.719306763 31 | Gene: GRK1_ptm-info Gene Type: Not Interesting -0.751526741 0.49762292 -0.142534658 -0.882124083 -1.151282849 2.307907188 -0.12032085 -0.351269532 -1.526178564 -0.753268428 3.600861739 -1.223995853 -0.607229424 -0.027417898 0.190161632 0.610550408 0.149796331 -0.122879865 0.247865963 -0.404833708 0.736929754 -0.944275068 -0.078919294 0.661648005 -0.244948779 3.051534602 -0.107365228 0.367536408 -1.517985824 32 | Gene: ROS1_ptm-info Gene Type: Interesting -0.31236414 0.701257089 0.47520812 -0.585297054 -0.122694283 -0.866875137 0.367939523 -0.481103706 2.072237711 10.29186436 1.298805701 -0.628175917 -0.173084375 -0.02710755 0.355169073 0.470456905 0.121400231 0.374924602 -0.278307341 -0.553746266 -0.935156558 -0.042420296 -0.479479902 -0.332400886 -0.710017011 1.873931755 0.204554429 -0.32315246 0.187572521 33 | Gene: MAP2K4_ptm-info Gene Type: Interesting 0.11931136 0.593670684 0.489152771 0.841683345 1.064673748 0.095113499 1.050152022 1.891488427 -5.5283552 0.64306832 -1.100026181 0.765710935 1.165406655 0.30638633 -1.365894262 0.635492291 -0.377798616 0.521665309 -0.608497433 0.398484128 -0.988354968 1.36349214 1.36269783 -0.112291585 -0.262719995 0.503524059 0.498006014 1.525942005 0.339189212 34 | Gene: SRC_ptm-info Gene Type: Interesting -0.294263824 -0.618071649 -0.252534114 -0.78660676 -0.228026664 0.977860794 -1.200449832 -0.22037931 -0.240489906 -0.201675468 1.47598938 -0.557000568 -0.502553204 -0.437501309 0.966927023 0.379670097 0.048795579 0.250622869 2.961024714 2.299033235 -1.210659274 0.418655141 1.161954005 -0.15700654 -1.254142937 -0.574558055 -0.662438275 3.702617515 -0.35302723 35 | Gene: TGFBR1_ptm-info Gene Type: Interesting -0.000863802 0.735638383 -0.680289747 0.040925843 0.359330228 -1.587400295 -1.041686081 0.071551408 -0.168322665 -1.377303308 3.604539089 -0.004601068 1.527568732 -0.300154707 -0.786135509 -0.138050924 -0.366480418 -0.796970206 -0.030155544 0.803100056 0.683145561 -0.900708154 0.15251077 0.140092011 0.376815421 -1.214319621 1.326197465 1.523070279 -1.312001824 36 | Gene: CAMK2B_ptm-info Gene Type: Not Interesting -0.276736819 -0.426080887 -0.160160461 -0.890032771 -0.437405434 0.143897214 -0.573425958 -0.486419381 -0.536963482 -0.657041002 -0.473345418 -0.237475279 -0.669396538 -0.559435302 0.038953301 0.033709721 -0.343587801 -0.513218087 -0.592303313 -0.431221835 5.339202897 -0.493778587 -0.645000361 -0.477984867 -0.401579746 -0.621782124 -0.249394627 -0.303365249 -0.922343302 37 | Gene: STK24_ptm-info Gene Type: Interesting -0.31807579 -0.814110809 0.646545188 0.26837169 -9.425120961 -1.073853473 -2.049589626 -0.346921024 0.997283181 0.300619253 -0.543103864 -1.150792172 -2.283061167 -0.162802216 -1.053859713 -1.377541743 -0.288349474 -0.922266884 -1.123953091 -0.762953893 -0.687357148 0.28991073 0.317576672 -0.345565515 0.541683 0.009754099 0.73792006 -0.624271752 0.100532547 38 | Gene: DCLK3_ptm-info Gene Type: Not Interesting -0.670177714 3.224533501 0.145509552 0.107432319 -1.120492739 0.288890539 1.549545918 -0.342665051 -0.017402855 -0.420002244 -0.361387453 -1.264272075 -0.794507765 -0.619944678 -0.338767802 -0.148529478 -1.078879645 0.130939014 -1.307815313 -1.818798474 3.683694337 0.920647357 -0.847056974 -0.343798498 -1.21552566 -0.853845334 -0.357215055 -0.043911541 -0.955847309 39 | Gene: LATS1_ptm-info Gene Type: Not Interesting -0.695252888 4.299877134 -0.175587126 -0.061022137 -0.391646018 3.385451038 0.345114288 -0.505734993 -0.482953864 -0.081815586 -0.928486879 0.976209137 0.099021487 2.494690556 -1.088742779 0.437174751 -0.507169467 2.028724319 -0.507954247 0.143506281 -1.19702953 0.610379518 0.095879151 -0.663118727 0.50821984 -0.741815419 2.38531026 0.354750355 0.658437634 40 | Gene: NEK9_ptm-info Gene Type: Not Interesting -0.337849025 -0.535265918 0.803160459 0.275911465 0.981343049 -0.748451144 -0.092431408 -0.326477104 -0.381243917 -0.575343824 -0.63351617 -0.380961411 -1.720616197 -0.85605361 -0.580950374 0.373293116 0.905490886 0.135705555 1.107780656 -0.545183144 0.475561701 0.016687596 -0.172178219 0.585186686 -0.40480014 -3.997318149 0.711029765 -0.470884061 0.354386296 41 | Gene: MYLK3_ptm-info Gene Type: Not Interesting -0.368173217 0.209192446 0.266317555 -0.100656799 -0.336791718 -0.060827204 -0.199021599 -0.765882671 -0.071476548 -0.4402703 -0.3548684 3.468121376 5.853726714 -0.465135408 0.074434692 7.085199705 -0.399050575 -0.334999773 -0.623071147 -0.406230833 0.939058116 -0.269533885 0.117950503 0.1975473 -0.365407931 -0.056856473 0.001983212 0.081609959 -0.603299855 -------------------------------------------------------------------------------- /txt/rc_two_cats.txt: -------------------------------------------------------------------------------- 1 | Cell Line: H1650 Cell Line: H23 Cell Line: CAL-12T Cell Line: H358 Cell Line: H1975 Cell Line: HCC15 Cell Line: H1355 Cell Line: HCC827 Cell Line: H2405 Cell Line: HCC78 Cell Line: H1666 Cell Line: H661 Cell Line: H838 Cell Line: H1703 Cell Line: CALU-3 Cell Line: H2342 Cell Line: H2228 Cell Line: H1299 Cell Line: H1792 Cell Line: H460 Cell Line: H2106 Cell Line: H441 Cell Line: H1944 Cell Line: H1437 Cell Line: H1734 Cell Line: LOU-NH91 Cell Line: HCC44 Cell Line: A549 Cell Line: H1781 2 | Category: two Category: two Category: two Category: one Category: two Category: two Category: three Category: one Category: five Category: five Category: four Category: five Category: five Category: five Category: four Category: four Category: one Category: three Category: three Category: three Category: four Category: one Category: three Category: four Category: one Category: five Category: four Category: four Category: one 3 | Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Female Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Female Gender: Male Gender: Male Gender: Male Gender: Male Gender: Male Gender: Female Gender: Male Gender: Female Gender: Female Gender: Female Gender: Male Gender: Female 4 | Gene: CDK4 Gene Type: Interesting -0.792803571 0.527687127 0.000622536 0.356722594 0.933286088 -0.131728538 0.808451944 4.240884801 -0.540231391 -0.981456952 -0.84689892 -0.252795921 0.114189581 -0.06649884 0.149218809 1.351263924 0.645867212 0.60561098 3.232454573 0.342634572 -0.430912324 -0.40590567 0.199563989 -1.122536294 2.210334571 0.405126315 -0.089763159 0.405126315 0.340012773 5 | Gene: LMTK3 Gene Type: Not Interesting 0.17762054 -0.016061489 5.422113833 1.307039675 0.355814985 0.276904994 0.483153915 -0.240495821 1.336445996 1.149618502 0.361412978 -0.380518938 -0.213541004 -0.471938639 -0.620858723 -0.163637058 -0.487256142 -0.029569688 -0.232057778 -0.669036939 -0.449241698 1.158930406 0.511962022 2.370834155 0.262893885 -0.513128895 -0.501210068 0.439277561 -0.342460508 6 | Gene: LRRK2 Gene Type: Not Interesting -0.697876151 -0.555610265 -0.360497559 -0.460236731 -0.680760697 -0.169463518 1.715708875 -0.517104823 0.184987709 0.8106597 -0.440334448 -0.621052026 -0.086803358 -0.753966225 -0.401972037 -0.562086752 -0.560644597 0.542301381 -0.382639145 -0.377853523 -0.713472923 -0.377609368 4.308904581 -0.638131949 -0.556114063 -0.318145763 -0.489582714 1.677376527 -0.682790464 7 | Gene: UHMK1 Gene Type: Not Interesting 0.850546518 -0.263279907 0.179253031 0.398646721 1.537663802 0.505291411 0.902366491 -0.16628803 0.630730564 0.399448283 0.847171367 -0.442268094 0.44368676 1.552969029 1.110283483 -0.326698072 -0.405267374 0.663747183 0.424470033 0.283221899 -4.243973921 0.718315578 1.747343933 -1.020927175 0.305028514 1.47174613 0.048902278 -0.255283556 0.548224573 8 | Gene: EGFR Gene Type: Interesting 1.412416216 0.018987506 0.902251622 -0.17813747 0.781819022 0.211815895 -0.023427175 3.557295952 1.173783556 -0.012362164 0.769782484 -0.681031743 -1.047375389 0.652065499 0.172316691 2.072433469 1.135709377 -0.169977181 0.881067136 -0.486159025 -1.451838026 0.371237737 -0.581665325 -0.126356157 0.241004724 1.06526919 0.974531796 0.668645091 0.05696489 9 | Gene: STK32A Gene Type: Interesting -0.388039665 -0.592626614 -0.24413651 0.740364734 3.023348415 -0.433985412 -0.630124457 1.156531983 0.433696213 3.84950782 -0.225425742 -0.656106808 -0.311953357 -0.397450226 1.044025538 -0.247816912 3.640524345 -0.59251039 0.514666245 -0.45396994 1.649737631 3.366020313 -0.430502237 -0.295312303 2.824551497 -0.014275115 -0.410477794 -0.229717784 3.709828616 10 | Gene: NRK Gene Type: Interesting 1.408537135 -0.017369325 -0.367127962 0.313253548 -0.16288686 0.027411933 -0.281351556 5.813846489 -0.161706584 0.472386752 -0.33979584 0.669956625 -0.2596391 -0.386601295 -0.293654593 4.390499721 -0.420942214 -0.402955154 -0.346809494 -0.222725132 0.36849943 1.49303248 -0.34174718 -0.343420451 5.284808607 -0.358156896 -0.222931558 -0.401391167 -0.412478715 11 | Gene: ERBB2 Gene Type: Not Interesting 0.906642406 -0.684771423 0.015261254 0.16056792 0.365002113 -0.564392699 0.169072827 -0.035192496 -0.031210405 0.447742443 0.544075103 0.280008477 -0.066278222 -0.225814318 4.103496507 1.219691566 -0.245022001 -0.681552658 -0.304817333 -0.511212295 -1.100056017 1.335983295 -0.500561544 0.721259819 0.284747072 0.232812724 -0.796930101 -0.156381455 1.503853721 12 | Gene: ERBB4 Gene Type: Not Interesting -0.452907052 -0.392790536 -0.374173515 -0.527418493 -0.320103334 -0.560657219 -0.312847509 -0.463903623 -0.304652329 -0.30897114 -0.331935876 4.098930821 -0.413942149 -0.501418917 1.256164876 -0.12356019 -0.425577927 -0.36998588 -0.054684881 -0.484730631 -0.419739472 -0.432412432 0.143245619 -0.266932489 -0.340860307 -0.231847291 -0.448292539 -0.42868169 -0.615936889 13 | Gene: AAK1 Gene Type: Not Interesting 3.579051735 0.92330807 -0.651094367 0.952743833 -0.212733397 0.006074527 -0.121038246 0.083769063 -0.722678214 1.669410989 -0.247600883 -0.284623649 -0.687716717 -0.320883885 -0.93370415 -0.309230053 0.544870152 0.824029397 -0.087291924 -0.973905867 -0.308282983 0.822145704 -0.72904308 -0.088865731 -0.29848499 -0.451367112 -1.134040733 0.379230443 1.491612577 14 | Gene: SRPK3 Gene Type: Not Interesting -0.582761335 -0.706379425 0.364313301 -0.483011414 -0.71307719 -0.048548064 -0.527944549 0.337501769 -0.656781635 -0.323318624 -0.432950623 -0.414799098 -1.02570516 -0.861415433 0.113447131 -0.110117735 -0.493510825 0.148841502 -0.341096914 -0.373760525 5.16013802 -0.204906932 -0.465574863 -0.170405491 0.046286803 -0.100887639 0.936150906 -0.15980844 -0.846857677 15 | Gene: STK39 Gene Type: Interesting -0.58688791 -0.186685902 -3.51852921 0.250628834 -0.477773537 -0.62381107 -0.92202388 -0.55383453 0.018847867 1.267644557 0.243732055 -0.233528273 0.070726356 -0.256360198 1.741001607 0.168247379 -0.245974299 0.014972759 -0.537623787 0.259364957 1.303190492 1.043208024 -1.021094946 -0.097444212 -0.679290593 0.132592576 -0.440607517 -0.21005684 0.651311995 16 | Gene: GRK4 Gene Type: Not Interesting -0.693639785 -0.357559299 -0.903861262 -0.810450279 0.293775461 1.012469252 -0.1044623 -0.573757161 -0.629467998 -1.131138938 -0.401984075 -0.672554564 -1.182791974 -1.00138041 -1.020216093 -0.437799213 -0.103783602 -0.387565928 0.386471772 -0.524742175 3.627070248 -0.846550806 -0.473736661 0.443388919 0.766135257 0.193683015 -0.614757268 -0.382906171 -0.864410411 17 | Gene: TBK1 Gene Type: Not Interesting 0.327203594 0.857319301 -1.397356596 -0.226683585 -0.986051455 0.438343505 0.095527129 5.598772308 0.535025797 -0.057479225 -0.089086932 -0.473291011 2.070252907 0.201409888 1.134728133 0.095734104 0.22916067 0.566649702 1.011889663 0.902342556 0.735510304 -0.493538517 3.176918602 0.490045737 -0.693495689 -0.183704878 -0.182356844 0.744728769 -0.311064392 18 | Gene: INSRR Gene Type: Not Interesting 0.331108191 -0.467978397 0.681112329 -1.195914121 -0.538461957 3.616542204 0.094919881 0.527357553 -0.160478312 -0.940444939 -1.025689676 0.053722044 0.081275611 -0.616337936 -0.48042057 0.620903382 -0.723793064 -0.759642991 -0.744900744 -0.43363819 -0.804929849 0.429022269 0.765012989 0.400356525 0.207741849 1.474373008 0.888734282 0.387715581 -0.623678298 19 | Gene: IRAK1 Gene Type: Interesting 0.141183837 0.788608352 0.51421388 0.528255597 0.906234597 0.050158065 -0.843341382 1.44384296 -0.253699343 0.562537497 -0.505923659 -6.621630156 -1.253907087 0.947883556 0.394657662 0.613122698 -0.223205762 0.905373736 0.679301554 0.859789222 -0.022354553 1.046726052 0.194471 0.359912834 -0.806348234 1.586619876 0.311985824 0.544470027 1.479598537 20 | Gene: KDR Gene Type: Not Interesting -0.524309949 -0.285994749 -0.484871681 0.176655189 -0.139711627 -0.352978397 -0.192529854 -0.601257973 -0.427323427 0.459403791 -0.383001 -0.571316753 -0.387982572 -0.353945118 -0.24031472 -0.305685522 0.456864915 -0.134886653 2.136583182 -0.463127859 -0.600408393 4.430955918 -0.374772774 -0.206853106 1.756104521 0.832887289 -0.314392176 -0.31373742 0.493420283 21 | Gene: NPR1 Gene Type: Interesting 0.509592174 0.464774315 0.275495704 -0.01882253 -0.005537792 -0.457197866 3.408253083 -0.430407643 -0.754186082 -0.836530855 -0.277053957 -0.54257843 0.850439577 -0.298981449 -0.169394809 0.254783369 -0.427821755 -0.1389465 0.618069135 1.926349897 -0.305399918 -0.535939215 -0.078661666 6.267854465 -0.57293844 -0.302168609 0.72307512 0.611863003 -0.2145995 22 | Gene: PAK3 Gene Type: Not Interesting -0.554447111 -0.145753485 0.019807701 -0.634915727 -0.493766887 -0.587644968 -0.223714996 1.385049476 -0.346100755 0.254536609 0.057887318 -0.593081366 -0.47065185 -0.753944095 -0.044570503 -0.334597636 0.339587788 -0.135201173 -0.111896921 4.913848539 -0.542750046 -0.27783299 -0.689097582 -0.216688006 1.127699857 0.03588747 -0.070416156 -0.553268062 -0.977427689 23 | Gene: PDGFRA Gene Type: Interesting -0.530831743 -0.260873607 -0.461134617 -0.389056188 -0.512555424 -0.513202268 -0.337550397 -0.449953768 -0.184845421 -0.565880657 -0.376356202 -0.285266104 -0.570505879 3.598950655 -0.477052971 -0.364449691 -0.648917127 -0.390363086 1.117877995 -0.464086586 -0.456431016 -0.571010056 -0.456668794 -0.58230495 -0.426192704 -0.411161613 -0.455618608 -0.297498019 -0.47141323 24 | Gene: PDK4 Gene Type: Not Interesting -0.643246331 0.052021433 -0.735006626 0.041068843 -0.062094125 5.477714716 1.256686967 -0.136401851 0.577266871 -0.60002565 0.087671916 -0.560959779 -0.56490393 -0.629261602 -0.214226487 0.09929963 -0.095715004 -0.632856345 1.09320354 0.386976419 -0.374720076 -0.564957229 1.680895489 0.508498891 0.916604247 -0.607497709 0.58989289 -0.122764837 -0.533651064 25 | Gene: ULK4 Gene Type: Interesting -0.693868027 0.57619653 -0.488541037 -0.60094858 1.139598497 -0.024286993 -0.288194559 -0.499250839 0.103697413 -1.044716157 -1.02427507 3.023003444 3.859028873 0.742978442 -0.352722819 0.069205383 -1.117741919 0.651553716 -0.364768511 -0.5472691 -0.735728719 -0.623797671 -0.105370633 -0.139431549 -0.055372648 0.774449831 0.538987341 0.35514519 -1.45386407 26 | Gene: PRKCE Gene Type: Not Interesting 0.006531886 0.564826732 3.695318524 0.316255033 -0.268737774 0.936461505 0.2291517 0.649579007 -0.330103595 -0.504534505 0.264729237 -0.977228043 0.493632169 -0.401821398 -0.286231779 0.143371278 -0.360231532 0.340762717 0.633162512 -0.710530502 -1.334690191 0.158108045 -0.347820435 -0.074497061 -0.970507716 -0.26479443 -0.298648517 -0.10090872 -0.11742112 27 | Gene: PRKG2 Gene Type: Not Interesting -0.185695405 -0.173758799 0.084357105 1.826502656 0.00816719 -1.102148634 0.299002536 0.458848186 0.292508806 0.110508201 0.083592283 -0.494333063 -0.117947546 -0.539712481 -0.106334279 -0.403083002 -0.789473381 1.041787363 1.70041072 -0.293951867 4.839524758 1.015480815 0.841188534 -0.620389764 -0.565583764 -0.262366184 0.226425315 -0.048000565 1.126249373 28 | Gene: MAPK4 Gene Type: Interesting 0.184462349 -0.526037871 0.432087272 -0.882311913 0.246356093 0.858754521 0.052858019 -1.118340603 -0.846948816 -0.778824075 3.525192777 -1.872745007 -0.779756435 -1.039639399 -0.59333431 0.402156007 -1.387426464 -0.145435051 -0.46497243 -0.221064461 -0.861483648 0.125415634 -0.191849116 2.374460297 -0.74142144 0.7654394 1.029796862 0.03307866 0.44066582 29 | Gene: MAPK11 Gene Type: Interesting 1.760301448 -0.912259652 -1.163345889 -0.965891664 -0.795153414 -0.616300339 -1.360743997 -1.448291877 -0.024088935 -1.188868793 -0.229906845 2.181489143 -1.154435684 6.28292787 -0.303782002 -0.165568925 -1.126153349 1.678721355 -1.683560793 -0.864063548 -0.025445472 1.890946219 0.667805988 -0.625764381 -1.063340313 3.222816803 -0.001359619 -0.203661756 0.187669924 30 | Gene: STK31 Gene Type: Interesting -0.07364355 -0.103789279 -0.171304836 0.351910065 0.63677969 -0.136732984 0.356830815 3.889115824 0.645442526 1.366358918 0.995319244 5.608685402 1.101919141 -0.554900568 0.087820649 0.061305127 1.931275557 -0.692417574 -0.481807702 -0.16288735 -0.538298189 2.440412245 0.804274605 -0.605526195 1.788457016 -0.376520922 0.35819202 0.164487781 3.719306763 31 | Gene: GRK1 Gene Type: Not Interesting -0.751526741 0.49762292 -0.142534658 -0.882124083 -1.151282849 2.307907188 -0.12032085 -0.351269532 -1.526178564 -0.753268428 3.600861739 -1.223995853 -0.607229424 -0.027417898 0.190161632 0.610550408 0.149796331 -0.122879865 0.247865963 -0.404833708 0.736929754 -0.944275068 -0.078919294 0.661648005 -0.244948779 3.051534602 -0.107365228 0.367536408 -1.517985824 32 | Gene: ROS1 Gene Type: Interesting -0.31236414 0.701257089 0.47520812 -0.585297054 -0.122694283 -0.866875137 0.367939523 -0.481103706 2.072237711 10.29186436 1.298805701 -0.628175917 -0.173084375 -0.02710755 0.355169073 0.470456905 0.121400231 0.374924602 -0.278307341 -0.553746266 -0.935156558 -0.042420296 -0.479479902 -0.332400886 -0.710017011 1.873931755 0.204554429 -0.32315246 0.187572521 33 | Gene: MAP2K4 Gene Type: Interesting 0.11931136 0.593670684 0.489152771 0.841683345 1.064673748 0.095113499 1.050152022 1.891488427 -5.5283552 0.64306832 -1.100026181 0.765710935 1.165406655 0.30638633 -1.365894262 0.635492291 -0.377798616 0.521665309 -0.608497433 0.398484128 -0.988354968 1.36349214 1.36269783 -0.112291585 -0.262719995 0.503524059 0.498006014 1.525942005 0.339189212 34 | Gene: SRC Gene Type: Interesting -0.294263824 -0.618071649 -0.252534114 -0.78660676 -0.228026664 0.977860794 -1.200449832 -0.22037931 -0.240489906 -0.201675468 1.47598938 -0.557000568 -0.502553204 -0.437501309 0.966927023 0.379670097 0.048795579 0.250622869 2.961024714 2.299033235 -1.210659274 0.418655141 1.161954005 -0.15700654 -1.254142937 -0.574558055 -0.662438275 3.702617515 -0.35302723 35 | Gene: TGFBR1 Gene Type: Interesting -0.000863802 0.735638383 -0.680289747 0.040925843 0.359330228 -1.587400295 -1.041686081 0.071551408 -0.168322665 -1.377303308 3.604539089 -0.004601068 1.527568732 -0.300154707 -0.786135509 -0.138050924 -0.366480418 -0.796970206 -0.030155544 0.803100056 0.683145561 -0.900708154 0.15251077 0.140092011 0.376815421 -1.214319621 1.326197465 1.523070279 -1.312001824 36 | Gene: CAMK2B Gene Type: Not Interesting -0.276736819 -0.426080887 -0.160160461 -0.890032771 -0.437405434 0.143897214 -0.573425958 -0.486419381 -0.536963482 -0.657041002 -0.473345418 -0.237475279 -0.669396538 -0.559435302 0.038953301 0.033709721 -0.343587801 -0.513218087 -0.592303313 -0.431221835 5.339202897 -0.493778587 -0.645000361 -0.477984867 -0.401579746 -0.621782124 -0.249394627 -0.303365249 -0.922343302 37 | Gene: STK24 Gene Type: Interesting -0.31807579 -0.814110809 0.646545188 0.26837169 -9.425120961 -1.073853473 -2.049589626 -0.346921024 0.997283181 0.300619253 -0.543103864 -1.150792172 -2.283061167 -0.162802216 -1.053859713 -1.377541743 -0.288349474 -0.922266884 -1.123953091 -0.762953893 -0.687357148 0.28991073 0.317576672 -0.345565515 0.541683 0.009754099 0.73792006 -0.624271752 0.100532547 38 | Gene: DCLK3 Gene Type: Not Interesting -0.670177714 3.224533501 0.145509552 0.107432319 -1.120492739 0.288890539 1.549545918 -0.342665051 -0.017402855 -0.420002244 -0.361387453 -1.264272075 -0.794507765 -0.619944678 -0.338767802 -0.148529478 -1.078879645 0.130939014 -1.307815313 -1.818798474 3.683694337 0.920647357 -0.847056974 -0.343798498 -1.21552566 -0.853845334 -0.357215055 -0.043911541 -0.955847309 39 | Gene: LATS1 Gene Type: Not Interesting -0.695252888 4.299877134 -0.175587126 -0.061022137 -0.391646018 3.385451038 0.345114288 -0.505734993 -0.482953864 -0.081815586 -0.928486879 0.976209137 0.099021487 2.494690556 -1.088742779 0.437174751 -0.507169467 2.028724319 -0.507954247 0.143506281 -1.19702953 0.610379518 0.095879151 -0.663118727 0.50821984 -0.741815419 2.38531026 0.354750355 0.658437634 40 | Gene: NEK9 Gene Type: Not Interesting -0.337849025 -0.535265918 0.803160459 0.275911465 0.981343049 -0.748451144 -0.092431408 -0.326477104 -0.381243917 -0.575343824 -0.63351617 -0.380961411 -1.720616197 -0.85605361 -0.580950374 0.373293116 0.905490886 0.135705555 1.107780656 -0.545183144 0.475561701 0.016687596 -0.172178219 0.585186686 -0.40480014 -3.997318149 0.711029765 -0.470884061 0.354386296 41 | Gene: MYLK3 Gene Type: Not Interesting -0.368173217 0.209192446 0.266317555 -0.100656799 -0.336791718 -0.060827204 -0.199021599 -0.765882671 -0.071476548 -0.4402703 -0.3548684 3.468121376 5.853726714 -0.465135408 0.074434692 7.085199705 -0.399050575 -0.334999773 -0.623071147 -0.406230833 0.939058116 -0.269533885 0.117950503 0.1975473 -0.365407931 -0.056856473 0.001983212 0.081609959 -0.603299855 -------------------------------------------------------------------------------- /txt/rc_val_cats.txt: -------------------------------------------------------------------------------- 1 | col-A col-B col-C 2 | 1 3 2 3 | row-A 1 1 2 3 4 | row-B 2 10 11 12 5 | row-C 3 7 8 9 6 | row-D 7 4 5 6 --------------------------------------------------------------------------------