├── Agenda.pdf ├── Notebooks ├── Brief Introduction.ipynb ├── Pahl_NotebookTools_Tutorial.ipynb ├── README.md ├── State of the toolkit.distrib.ipynb ├── Stiefl_RDKitPh4FullPublication.ipynb ├── Whats New.ipynb ├── data │ ├── Target_no_65.pkl │ └── chembl_cyps.head.sdf └── images │ ├── 1py5Ph44pointph4InPocket.png │ ├── 1py5Ph4DistanceExample.png │ ├── 1py5Ph4DistancesOrig.png │ ├── 4PointAlignment3.png │ ├── 4PointAlignment6.png │ ├── 4PointAlignmentAll.png │ ├── 4PointAlignmentAllN.png │ ├── 4PointAlignmentCheat6.png │ ├── 4PointAlignmentCheat6N.png │ ├── KNIME_coords_and_smiles.png │ ├── KNIME_coords_and_smiles_out.png │ ├── KNIME_descriptors.png │ ├── KNIME_descriptors_out.png │ ├── KNIME_descrs.png │ ├── KNIME_descrs_missing.png │ ├── KNIME_generate_3d_coords.png │ ├── KNIME_generate_3d_coords_out.png │ ├── KNIME_generate_confs.png │ ├── KNIME_generate_confs_out.png │ ├── KNIME_sanitization.png │ ├── T5.shaded.132.png │ ├── alignEmbed3Point.png │ ├── alignEmbed3PointN.png │ ├── docs_overview.png │ ├── docs_zoom.png │ ├── ecodesystem.png │ ├── kinaseOverview.png │ ├── logo.lrg.png │ ├── ph4_tutorial.png │ └── tutorial_example.png ├── Presentations ├── BrianKelley-NovartisChemicalUniverse.pdf ├── Brown_OriginsOf3D.pdf ├── Ehmki_and_KramerMatchedMolecularSeries.pdf ├── Flachsenberg_RingDecomposerLib.pdf ├── Godin_OneCentralTool_Lightning.pdf ├── JohnMayfield_Depiction.pdf ├── Landrum_Schneider_GitHub_Git_and_RDKit.pdf ├── Pahl_NotebookTools_Intro.pdf ├── PaoloTosco_OpenMM_RDKit_integration.pdf ├── README.md ├── Ruedisser_SelectivityProteases_talk.ipynb ├── Sayle_RDKitTautomers.pdf ├── Schwarze_RDKit_UGM_Oct_2016_How_to_Develop_New_RDKit_Nodes.pdf ├── Schwarze_RDKit_UGM_Oct_2016_Workshop_Writing_RDKit_KNIME_Nodes_Hands-On.pdf ├── SelectivityMaps.py └── Vianello_FasterSimilarityQueries.pdf ├── README.md └── Tutorials ├── Part1_Toy_data_example_and_overfitting_risks.ipynb ├── Part2_Descriptors_and_regression.ipynb ├── Part3_Fingerprints_and_classification.ipynb └── README.md /Agenda.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Agenda.pdf -------------------------------------------------------------------------------- /Notebooks/README.md: -------------------------------------------------------------------------------- 1 | Placeholder 2 | -------------------------------------------------------------------------------- /Notebooks/State of the toolkit.distrib.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false, 8 | "slideshow": { 9 | "slide_type": "skip" 10 | } 11 | }, 12 | "outputs": [ 13 | { 14 | "name": "stderr", 15 | "output_type": "stream", 16 | "text": [ 17 | "/Library/Python/2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.\n", 18 | " \"`IPython.html.widgets` has moved to `ipywidgets`.\", ShimWarning)\n" 19 | ] 20 | } 21 | ], 22 | "source": [ 23 | "import gzip\n", 24 | "from rdkit import Chem\n", 25 | "from rdkit.Chem import Draw,AllChem\n", 26 | "from rdkit.Chem.Draw import IPythonConsole\n", 27 | "from IPython.display import Image\n" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "slideshow": { 34 | "slide_type": "slide" 35 | } 36 | }, 37 | "source": [ 38 | "\n", 39 | "\n", 40 | "\n", 42 | "\n", 43 | "\n", 44 | "\n", 47 | "\n", 48 | "\n", 49 | "\n", 55 | "\n", 58 | "
\n", 41 | "
\n", 45 | "

RDKit: State of the toolkit (2016 UGM edition)

\n", 46 | "
\n", 50 | "Greg Landrum, Ph.D.
\n", 51 | "T5 Informatics, KNIME.com
\n", 52 | "Basel, Switzerland
\n", 53 | "\"T5\n", 54 | "
\n", 56 | "\"RDKit\n", 57 | "
" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "slideshow": { 65 | "slide_type": "slide" 66 | } 67 | }, 68 | "source": [ 69 | "# An overview of the RDKit\n", 70 | "\n" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": { 76 | "slideshow": { 77 | "slide_type": "subslide" 78 | } 79 | }, 80 | "source": [ 81 | "## Open-source toolkit for cheminformatics\n", 82 | "- Business-friendly BSD license\n", 83 | "- Core data structures and algorithms in C++\n", 84 | "- Python (2.x and 3.x) wrapper generated using Boost.Python\n", 85 | "- Java and C\\# wrappers generated with SWIG\n", 86 | "- 2D and 3D molecular operations\n", 87 | "- Descriptor generation for machine learning\n", 88 | "- Molecular database cartridge for PostgreSQL\n", 89 | "- Cheminformatics nodes for KNIME (distributed from the KNIME community site: http://tech.knime.org/community/rdkit)\n" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": { 95 | "slideshow": { 96 | "slide_type": "subslide" 97 | } 98 | }, 99 | "source": [ 100 | "## Ecosystem\n", 101 | "\n", 102 | "![RDKit ecosystem](images/ecodesystem.png)\n", 103 | "\n", 104 | "*Exact same algorithms/implementations accessible from many different endpoints*" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": { 110 | "slideshow": { 111 | "slide_type": "subslide" 112 | } 113 | }, 114 | "source": [ 115 | "## Operational\n", 116 | "- http://www.rdkit.org\n", 117 | "- Supports Mac/Windows/Linux\n", 118 | "- Releases every 6 months\n", 119 | "- Web presence:\n", 120 | " - Homepage: http://www.rdkit.org\n", 121 | " Documentation, links\n", 122 | " - Github (https://github.com/rdkit)\n", 123 | " Downloads, bug tracker, git repository\n", 124 | " - Sourceforge (http://sourceforge.net/projects/rdkit)\n", 125 | " Mailing lists\n", 126 | " - Blog (https://rdkit.blogspot.com)\n", 127 | " Tips, tricks, random stuff\n", 128 | " - Tutorials (https://github.com/rdkit/rdkit-tutorials)\n", 129 | " Jupyter-based tutorials for using the RDKit\n", 130 | " - KNIME integration (https://github.com/rdkit/knime-rdkit)\n", 131 | " RDKit nodes for KNIME\n", 132 | "- Mailing lists at https://sourceforge.net/p/rdkit/mailman/, searchable archives available for [rdkit-discuss](http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/) and [rdkit-devel](http://www.mail-archive.com/rdkit-devel@lists.sourceforge.net/)\n", 133 | "- Social media:\n", 134 | " - Twitter: @RDKit_org\n", 135 | " - LinkedIn: https://www.linkedin.com/groups/8192558\n", 136 | " - Google+: https://plus.google.com/u/0/116996224395614252219\n", 137 | " - Slack: https://rdkit.slack.com (invite required, contact Greg)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": { 143 | "slideshow": { 144 | "slide_type": "subslide" 145 | } 146 | }, 147 | "source": [ 148 | "## History and Milestones:\n", 149 | "- 2000-2006: initial development work at Rational Discovery\n", 150 | "- 2006: code open sourced and released on sourceforge.net\n", 151 | "- 2007: First NIBR contribution (chemical reaction handling); Noel discovers the RDKit (=first rdkit-discuss post?)\n", 152 | "- 2008: first POC of Java wrapper; Mac support added; SLN and Mol2 parsers; \n", 153 | "- 2009: Morgan fingerprints; switch to cmake; switch to VF2 for SSS\n", 154 | "- 2010: PostgreSQL cartridge; First iteration of the KNIME nodes; $RDBASE/Contrib appears; SaltRemover and FunctionalGroups code\n", 155 | "- 2011: New Java wrappers; more functionality moved to C++; InChI support; Avalontools integration\n", 156 | "- 2012: First UGM; Speed improvements; MCS implementation; IPython integration; “RDKit Cookbook” appears\n", 157 | "- 2013: Move to github; Pandas integration; MMFF and Open3DAlign support; PDB support; rdkit blog started\n", 158 | "- 2014: python3 support; conda integration; experimental lucene integration; MCS implementation in C++\n", 159 | "- 2015: new drawing code; improved canonicalization algorithm; improved 3D coordinate generation; reduced memory usage\n", 160 | "- 2016: Regular patch releases; easier builds; performance improvements; KNIME nodes move to Github " 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": { 166 | "slideshow": { 167 | "slide_type": "subslide" 168 | } 169 | }, 170 | "source": [ 171 | "## Functionality Overview: Basics\n", 172 | "- Input/Output: SMILES/SMARTS, SDF, TDT, SLN [1](#footnote1), Corina mol2 [1](#footnote1), PDB, sequence notation, FASTA (peptides only), HELM (peptides only)\n", 173 | "- Substructure searching\n", 174 | "- Canonical SMILES\n", 175 | "- Chirality support (i.e. R/S or E/Z labeling)\n", 176 | "- Chemical transformations (e.g. remove matching substructures)\n", 177 | "- Chemical reactions\n", 178 | "- Molecular serialization (e.g. mol \\<-\\> text)\n", 179 | "- 2D depiction, including constrained depiction\n", 180 | "- Fingerprinting: Daylight-like, atom pairs, topological torsions, Morgan algorithm, “MACCS keys”, extended reduced graphs, etc.\n", 181 | "- Similarity/diversity picking\n", 182 | "- Gasteiger-Marsili charges\n", 183 | "- Bemis and Murcko scaffold determination\n", 184 | "- Salt stripping\n", 185 | "- Functional-group filters" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": { 191 | "slideshow": { 192 | "slide_type": "subslide" 193 | } 194 | }, 195 | "source": [ 196 | "## Functionality Overview: 2D\n", 197 | "- 2D pharmacophores [1](#footnote1)\n", 198 | "- Hierarchical subgraph/fragment analysis\n", 199 | "- RECAP and BRICS implementations\n", 200 | "- Multi-molecule maximum common substructure [2](#footnote2)\n", 201 | "- Enumeration of molecular resonance structures\n", 202 | "- Molecular descriptor library:\n", 203 | " - Topological (κ3, Balaban J, etc.)\n", 204 | " - Compositional (Number of Rings, Number of Aromatic Heterocycles, etc.)\n", 205 | " - Electrotopological state (Estate)\n", 206 | " - clogP, MR (Wildman and Crippen approach)\n", 207 | " - “MOE like” VSA descriptors\n", 208 | " - MQN [6](#footnote6)\n", 209 | "- Similarity Maps [7](#footnote7)\n", 210 | "- Machine Learning:\n", 211 | " - Clustering (hierarchical, Butina)\n", 212 | " - Information theory (Shannon entropy, information gain, etc.)\n", 213 | "- Tight integration with the [Jupyter](http://jupyter.org) notebook (formerly the IPython notebook) and [Pandas](http://pandas.pydata.org/).\n" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "slideshow": { 220 | "slide_type": "subslide" 221 | } 222 | }, 223 | "source": [ 224 | "## Functionality Overview: 3D\n", 225 | "- 2D-\\>3D conversion/conformational analysis via distance geometry \n", 226 | "- UFF and MMFF94/MMFF94S implementations for cleaning up structures\n", 227 | "- Pharmacophore embedding (generate a pose of a molecule that matches a 3D pharmacophore) [1](#footnote1)\n", 228 | "- Feature maps\n", 229 | "- Shape-based similarity\n", 230 | "- RMSD-based molecule-molecule alignment\n", 231 | "- Shape-based alignment (subshape alignment [3](#footnote3)) [1](#footnote1)\n", 232 | "- Unsupervised molecule-molecule alignment using the Open3DAlign algorithm [4](#footnote4)\n", 233 | "- Integration with PyMOL for 3D visualization\n", 234 | "- Molecular descriptor library:\n", 235 | " - PMI, NPR, PBF, etc.\n", 236 | " - Feature-map vectors [5](#footnote5)\n", 237 | "- Torsion Fingerprint Differences for comparing conformations [8](#footnote8)\n" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": { 243 | "slideshow": { 244 | "slide_type": "subslide" 245 | } 246 | }, 247 | "source": [ 248 | "## Documentation\n", 249 | "[Overview](http://rdkit.readthedocs.org/en/latest/):\n", 250 | "\n", 251 | "![docs overview](images/docs_overview.png)\n", 252 | "\n", 253 | "Generated with Sphinx (standard python documentation tool)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": { 259 | "slideshow": { 260 | "slide_type": "subslide" 261 | } 262 | }, 263 | "source": [ 264 | "## Documentation\n", 265 | "[Sample](http://rdkit.readthedocs.org/en/latest/GettingStartedInPython.html#reading-single-molecules):\n", 266 | "\n", 267 | "![doc zoom](images/docs_zoom.png)\n", 268 | "\n", 269 | "All Python code samples are *tested* to protect against doc-rot.\n" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "slideshow": { 276 | "slide_type": "subslide" 277 | } 278 | }, 279 | "source": [ 280 | "## Tutorials\n", 281 | "\n", 282 | "[Github repo](https://github.com/rdkit/rdkit-tutorials)\n", 283 | "![tutorial](images/tutorial_example.png)\n", 284 | "\n", 285 | "All Python code samples are *tested* to protect against doc-rot.\n" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": { 291 | "slideshow": { 292 | "slide_type": "subslide" 293 | } 294 | }, 295 | "source": [ 296 | "## Footnotes\n", 297 | "1: These implementations are functional but are not necessarily the best, fastest, or most complete.\n", 298 | "\n", 299 | "2: Originally contributed by Andrew Dalke\n", 300 | "\n", 301 | "3: Putta, S., Eksterowicz, J., Lemmen, C. & Stanton, R. \"A Novel Subshape Molecular Descriptor\" *Journal of Chemical Information and Computer Sciences* **43:1623–35** (2003).\n", 302 | "\n", 303 | "4: Tosco, P., Balle, T. & Shiri, F. \"Open3DALIGN: an open-source software aimed at unsupervised ligand alignment.\" *J Comput Aided Mol Des* **25:777–83** (2011).\n", 304 | "\n", 305 | "5: Landrum, G., Penzotti, J. & Putta, S. \"Feature-map vectors: a new class of informative descriptors for computational drug discovery\" *Journal of Computer-Aided Molecular Design* **20:751–62** (2006).\n", 306 | "\n", 307 | "6: Nguyen, K. T., Blum, L. C., van Deursen, R. & Reymond, J.-L. \"Classification of Organic Molecules by Molecular Quantum Numbers.\" *ChemMedChem* **4:1803–5** (2009).\n", 308 | "\n", 309 | "7: Riniker, S. & Landrum, G. A. \"Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods.\" *Journal of Cheminformatics* **5:43** (2013).\n", 310 | "\n", 311 | "8: Schulz-Gasch, T., Schärfer, C., Guba, W. & Rarey, M. \"TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations.\" *J. Chem. Inf. Model.* **52:1499–1512** (2012).\n", 312 | "\n", 313 | "9: Riniker, S. & Landrum, G. A. \"Better informed distance geometry: Using what we know to improve conformation generation.\" *J. Chem. Inf. Model.* **55:2562–74** (2015). \n" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "slideshow": { 320 | "slide_type": "slide" 321 | } 322 | }, 323 | "source": [ 324 | "# Sustainability\n", 325 | "\n", 326 | "Solving the bus problem...\n", 327 | "\n", 328 | "- This clearly isn’t just a hobby project any more\n", 329 | "- Used internally in NIBR and other companies in multiple production systems\n", 330 | "- Contributions (features, bug fixes, etc) coming in from the community, including from other companies\n", 331 | "- I’m no longer the only one answering questions on the mailing list\n", 332 | "- Part of other open-source projects\n" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": { 338 | "slideshow": { 339 | "slide_type": "slide" 340 | } 341 | }, 342 | "source": [ 343 | "# Community\n", 344 | "\n", 345 | "The core of any open-source project\n" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": { 351 | "slideshow": { 352 | "slide_type": "subslide" 353 | } 354 | }, 355 | "source": [ 356 | "## Who's using it?\n", 357 | "\n", 358 | "Hard to say with any certainty\n", 359 | "\n", 360 | "- Active contributors to the mailing list from:\n", 361 | " - Big pharma\n", 362 | " - Small pharma/biotech\n", 363 | " - Software/Services\n", 364 | " - Academia\n", 365 | "- All of the last three UGMs at capacity with 40+ attendees\n", 366 | "- Contributions coming from the community:\n", 367 | " - bug reports \n", 368 | " - wiki pages\n", 369 | " - code and documentation patches\n", 370 | " - changes to the build system\n", 371 | " - active use in other systems.\n", 372 | "- Community contributions for packaging:\n", 373 | " - rpms/debs for Fedora/Debian linux\n", 374 | " - homebrew recipe for MacOS\n", 375 | " - conda packages\n" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": { 381 | "slideshow": { 382 | "slide_type": "subslide" 383 | } 384 | }, 385 | "source": [ 386 | "## Contrib dir\n", 387 | "\n", 388 | "The Contrib directory, part of the standard RDKit distribution, includes code that has been contributed by members of the community.\n", 389 | "\n", 390 | "### LEF: Local Environment Fingerprints\n", 391 | "\n", 392 | "Contains python source code from the publications:\n", 393 | "\n", 394 | "- A. Vulpetti, U. Hommel, G. Landrum, R. Lewis and C. Dalvit, \"Design and NMR-based screening of LEF, a library of chemical fragments with different Local Environment of Fluorine\" *J. Am. Chem. Soc.* **131** (2009) 12949-12959. http://dx.doi.org/10.1021/ja905207t\n", 395 | "- Vulpetti, G. Landrum, S. Ruedisser, P. Erbel and C. Dalvit, \"19F NMR Chemical Shift Prediction with Fluorine Fingerprint Descriptor\" *J. of Fluorine Chemistry* **131** (2010) 570-577. http://dx.doi.org/10.1016/j.jfluchem.2009.12.024\n", 396 | "\n", 397 | "Contribution from Anna Vulpetti\n", 398 | "\n", 399 | "### M\\_Kossner\n", 400 | "\n", 401 | "Contains a set of pharmacophoric feature definitions as well as code for finding molecular frameworks.\n", 402 | "\n", 403 | "Contribution from Markus Kossner\n", 404 | "\n", 405 | "### PBF: Plane of best fit\n", 406 | "\n", 407 | "Contains C++ source code and sample data from the publication:\n", 408 | "\n", 409 | "Firth, N. Brown, and J. Blagg, \"Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules\" *Journal of Chemical Information and Modeling* **52** 2516-2525 (2012). http://pubs.acs.org/doi/abs/10.1021/ci300293f\n", 410 | "\n", 411 | "Contribution from Nicholas Firth\n", 412 | "\n", 413 | "### mmpa: Matched molecular pairs\n", 414 | "\n", 415 | "Python source and sample data for an implementation of the matched-molecular pair algorithm described in the publication:\n", 416 | "\n", 417 | "Hussain, J., & Rea, C. \"Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets.\" *Journal of chemical information and modeling* **50** 339-348 (2010). http://dx.doi.org/10.1021/ci900450m\n", 418 | "\n", 419 | "Includes a fragment indexing algorithm from the publication:\n", 420 | "\n", 421 | "Wagener, M., & Lommerse, J. P. \"The quest for bioisosteric replacements.\" *Journal of chemical information and modeling* **46** 677-685 (2006).\n", 422 | "\n", 423 | "Contribution from Jameed Hussain.\n", 424 | "\n", 425 | "### SA\\_Score: Synthetic assessibility score\n", 426 | "\n", 427 | "Python source for an implementation of the SA score algorithm described in the publication:\n", 428 | "\n", 429 | "Ertl, P. and Schuffenhauer A. \"Estimation of Synthetic Accessibility Score of Drug-like Molecules based on Molecular Complexity and Fragment Contributions\" *Journal of Cheminformatics* **1:8** (2009)\n", 430 | "\n", 431 | "Contribution from Peter Ertl\n", 432 | "\n", 433 | "### fraggle: A fragment-based molecular similarity algorithm\n", 434 | "\n", 435 | "Python source for an implementation of the fraggle similarity algorithm developed at GSK and described in this RDKit UGM presentation: https://github.com/rdkit/UGM_2013/blob/master/Presentations/Hussain.Fraggle.pdf\n", 436 | "\n", 437 | "Contribution from Jameed Hussain\n", 438 | "\n", 439 | "### pzc: Tools for building and validating classifiers\n", 440 | "\n", 441 | "Contribution from Paul Czodrowski\n", 442 | "\n", 443 | "### ConformerParser: parser for Amber trajectory files\n", 444 | "\n", 445 | "Contribution from Sereina Riniker\n", 446 | "\n", 447 | "### AtomAtomSimilarity: atom-atom-path method for fragment similarity\n", 448 | "\n", 449 | "Python source for an implementation of the Atom-Atom-Path similarity method for fragments described in the publication:\n", 450 | "\n", 451 | "Gobbi, A., Giannetti, A. M., Chen, H. & Lee, M.-L. \"Atom-Atom-Path similarity and Sphere Exclusion clustering: tools for prioritizing fragment hits.\" *J. Cheminformatics* **7:11** (2015). http://dx.doi.org10.1186/s13321-015-0056-8\n", 452 | "\n", 453 | "Contribution from Richard Hall" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": { 459 | "slideshow": { 460 | "slide_type": "subslide" 461 | } 462 | }, 463 | "source": [ 464 | "## Integration into other projects\n", 465 | "\n", 466 | "- [ChEMBL Beaker](https://github.com/mnowotka/chembl_beaker) - standalone web server wrapper for RDKit and OSRA\n", 467 | "- [myChEMBL](https://github.com/chembl/mychembl) ([blog post](http://chembl.blogspot.de/2013/10/chembl-virtual-machine-aka-mychembl.html), [paper](http://bioinformatics.oxfordjournals.org/content/early/2013/11/20/bioinformatics.btt666)) - A virtual machine implementation of open data and cheminformatics tools\n", 468 | "- [ZINC](http://zinc15.docking.org) - Free database of commercially-available compounds for virtual screening\n", 469 | "- [Coot](https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/) - software for macromolecular model building, model completion and validation\n", 470 | "- [sdf_viewer.py](https://github.com/apahl/sdf_viewer) - an interactive SDF viewer\n", 471 | "- [sdf2ppt](https://github.com/dkuhn/sdf2ppt) - Reads an SDFile and displays molecules as image grid in powerpoint/openoffice presentation.\n", 472 | "- [MolGears](https://github.com/admed/molgears) - A cheminformatics tool for bioactive molecules\n", 473 | "- [PYPL](http://www.biochemfusion.com/downloads/#OracleUtilities) - Simple cartridge that lets you call Python scripts from Oracle PL/SQL.\n", 474 | "- [shape-it-rdkit](https://github.com/jandom/shape-it-rdkit) - Gaussian molecular overlap code shape-it (from silicos it) ported to RDKit backend\n", 475 | "- [WONKA](http://wonka.sgc.ox.ac.uk/WONKA/) - Tool for analysis and interrogation of protein-ligand crystal structures\n", 476 | "- [OOMMPPAA](http://oommppaa.sgc.ox.ac.uk/OOMMPPAA/) - Tool for directed synthesis and data analysis based on protein-ligand crystal structures\n", 477 | "- [OCEAN](https://github.com/rdkit/OCEAN) - web-tool for target-prediction of chemical structures which uses ChEMBL as datasource\n", 478 | "- [chemfp](http://chemfp.com)\n", 479 | "- [rdkit_ipynb_tools](https://github.com/apahl/rdkit_ipynb_tools) - RDKit Tools for the IPython Notebook\n", 480 | "- [chemicalite](https://github.com/rvianello/chemicalite) - SQLite integration for the RDKit\n", 481 | "- [django-rdkit](https://github.com/rdkit/django-rdkit) - Django integration for the RDKit\n", 482 | "- [Vernalis KNIME nodes](https://tech.knime.org/book/vernalis-nodes-for-knime-trusted-extension)\n", 483 | "- [Erlwood KNIME nodes](https://tech.knime.org/community/erlwood)\n", 484 | "- [AZOrange](https://github.com/AZcompTox/AZOrange)\n" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": { 490 | "slideshow": { 491 | "slide_type": "slide" 492 | } 493 | }, 494 | "source": [ 495 | "\n", 496 | "\n", 497 | "\n", 501 | "\n", 503 | "\n", 504 | "\n", 505 | "\n", 514 | "
\n", 498 | "

Support

\n", 499 | "Another critical piece for any software project, open-source or otherwise.\n", 500 | "
\n", 502 | "
\n", 506 | "

Options:

\n", 507 | "
    \n", 508 | "
  • RDKit mailing list
  • \n", 509 | "
  • Github
  • \n", 510 | "
  • RDKit slack channel
  • \n", 511 | "
  • Commercial (via T5 Informatics)
  • \n", 512 | "
\n", 513 | "
" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": { 520 | "slideshow": { 521 | "slide_type": "subslide" 522 | } 523 | }, 524 | "source": [ 525 | "## Patch Releases\n", 526 | "\n", 527 | "- Starting with the 2016_03 version of the RDKit, we've been doing patch releases: about once a month we release a new version with just bug fixes.\n", 528 | "- The changes are documented and these should always be safe to install.\n", 529 | "- Some more thoughts on this appeared on the [T5 Informatics Blog](https://medium.com/@greg.landrum_t5/a-new-ish-rdkit-release-model-3efa17ff54b7#.8agtoocol)\n", 530 | "\n", 531 | "Here's an example of the release notes from a [patch release](https://github.com/rdkit/rdkit/releases/tag/Release_2016_03_5):\n", 532 | "```\n", 533 | " # Release_2016.03.5\n", 534 | " (Changes relative to Release_2016.03.4)\n", 535 | "\n", 536 | " ## Acknowledgements:\n", 537 | " Piotr Dabrowski, Markus Metz, Stephen Roughley, Riccardo Vianello\n", 538 | "\n", 539 | " ## Bug Fixes:\n", 540 | " - GetSSSR interrupts by segmentation fault\n", 541 | " (github issue #1023 from PiotrDabr)\n", 542 | " - typos in MMPA hash code\n", 543 | " (github issue #1044 from greglandrum)\n", 544 | " - Bond::BondDir::EITHERDOUBLE not exposed to python\n", 545 | " (github issue #1051 from greglandrum)\n", 546 | " - Fix leak with renumberAtoms() in the SWIG wrappers\n", 547 | " (github pull #1064 from greglandrum)\n", 548 | " - computeInitialCoords() should call the SSSR code before it calls assignStereochemistry()\n", 549 | " (github issue #1073 from greglandrum)\n", 550 | "```\n", 551 | "\n" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": { 557 | "slideshow": { 558 | "slide_type": "subslide" 559 | } 560 | }, 561 | "source": [ 562 | "\n", 563 | "\n", 564 | "\n", 571 | "\n", 573 | "\n", 574 | "
\n", 565 | "

Long-term support releases

\n", 566 | "
    \n", 567 | "
  • The idea here is borrowed from Ubuntu Linux and some other open-source software packages: we will occasionally designate an RDKit release as a \"long-term support release\" (or \"long-term release\", the terminology isn't quite settled). We'll create patch releases with bug fixes against these releases for a fixed period of time, likely two years.\n", 568 | "
  • Some more thoughts on this appeared on the T5 Informatics Blog
  • \n", 569 | "
\n", 570 | "
\n", 572 | "
\n" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": { 580 | "slideshow": { 581 | "slide_type": "slide" 582 | } 583 | }, 584 | "source": [ 585 | "# The Future\n", 586 | "\n", 587 | "Future work tends to be determined by what's needed for active projects or requests that come out of the community. So there's not really a roadmap.\n", 588 | "\n", 589 | "But sometimes's it's obvious. Here's a set of some things are already on the \"ToDo\" list or that are just being thought about." 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": { 595 | "slideshow": { 596 | "slide_type": "subslide" 597 | } 598 | }, 599 | "source": [ 600 | "## Some larger scale backend changes\n", 601 | "\n", 602 | "### Moving to modern C++. \n", 603 | "\n", 604 | "The goal is to modernize the C++ codebase and allow the developers to work with more up-to-date tools.\n", 605 | "This may also have a positive impact on performance/stability. \n", 606 | "\n", 607 | "We will start this after the \"Q3\" 2016 release. Here's some [more detail](https://medium.com/@greg.landrum_t5/the-rdkit-and-modern-c-48206b966218).\n", 608 | "\n", 609 | "### Starting to make backwards incompatible changes.\n", 610 | "\n", 611 | "The goal is to make the RDKit API simpler to use and understand and clean up some of the \"less than optimal\" decisions made early in the development of the toolkit.\n", 612 | "\n", 613 | "Some specific planned changes:\n", 614 | "- Dealing with the explicit/implicit hydrogen mess\n", 615 | "- Improvements to the representation of stereochemistry\n", 616 | "\n", 617 | "We will put a process in place for doing this, but not necessarily start, before the Q1 2017 release. \n", 618 | "Here's some [more detail](https://medium.com/@greg.landrum_t5/breaking-with-the-past-making-backwards-incompatible-changes-in-the-rdkit-68e006579663)." 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": { 624 | "slideshow": { 625 | "slide_type": "subslide" 626 | } 627 | }, 628 | "source": [ 629 | "## Some upcoming features/improvements\n", 630 | "\n", 631 | "- Get the structure checker to v1.0\n", 632 | "- Get the enumeration toolkit to v1.0\n", 633 | "- More 3D descriptors\n", 634 | "- Improved 3D integration in the jupyter notebook\n", 635 | "- More KNIME nodes!\n" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "metadata": { 641 | "slideshow": { 642 | "slide_type": "subslide" 643 | } 644 | }, 645 | "source": [ 646 | "# Still lots of other stuff to do though..." 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": { 652 | "slideshow": { 653 | "slide_type": "subslide" 654 | } 655 | }, 656 | "source": [ 657 | "## Technical Debt/Code improvements\n", 658 | "\n", 659 | "- More demos/documentation for advanced functionality. *Started:* https://github.com/rdkit/rdkit-tutorials\n", 660 | "- Ongoing performance improvements\n", 661 | "- Explore use of GPUs\n", 662 | "- Extend and better document the SWIG wrappers\n", 663 | "- Switch to boost.logging" 664 | ] 665 | }, 666 | { 667 | "cell_type": "markdown", 668 | "metadata": { 669 | "slideshow": { 670 | "slide_type": "fragment" 671 | } 672 | }, 673 | "source": [ 674 | "[This](http://bukai.pharm.or.jp/bukai_kozo/SARNews/SARNews_19.pdf) shouldn't be the only available tutorial for the ph4 embedding functionality:\n", 675 | "![japanese ph4 tutorial](images/ph4_tutorial.png)" 676 | ] 677 | }, 678 | { 679 | "cell_type": "markdown", 680 | "metadata": { 681 | "slideshow": { 682 | "slide_type": "subslide" 683 | } 684 | }, 685 | "source": [ 686 | "## Integrations\n", 687 | "- Additional KNIME nodes\n", 688 | "- Additional functionality for the PostgreSQL cartridge\n", 689 | "- Getting the Lucene integration to v1\n", 690 | "- Improve 3D integration with IPython notebook\n", 691 | "- Interactive 2D sketches in IPython notebook\n", 692 | "- Continued exploration of RDKit use in Javascript via emscripten\n", 693 | "- Explore integration with one of the NoSQL document stores (i.e. MongoDb, CouchDb, etc.)\n", 694 | "- Explore integration with Spark" 695 | ] 696 | }, 697 | { 698 | "cell_type": "markdown", 699 | "metadata": { 700 | "slideshow": { 701 | "slide_type": "subslide" 702 | } 703 | }, 704 | "source": [ 705 | "## New features: 2D\n", 706 | "- ongoing improvements in molecule-drawing code\n", 707 | "- improved S group support\n", 708 | "- pure RDKit molecular standardization\n", 709 | "- get molecule hashing code to v1\n", 710 | "- canonical tautomer generation\n", 711 | "- canonical CTAB generation\n", 712 | "- robust and flexible R-group decomposition\n", 713 | "- implementation of a \"scaffold hopping\" fingerprint like ERG (extended reduced graphs)\n", 714 | "- improved query-query matching to allow pseudo-Markush substructure searches" 715 | ] 716 | }, 717 | { 718 | "cell_type": "markdown", 719 | "metadata": { 720 | "slideshow": { 721 | "slide_type": "subslide" 722 | } 723 | }, 724 | "source": [ 725 | "## New features: 3D\n", 726 | "- implicit solvent model for the force fields\n", 727 | "- implementation of a 3D pharmacophore fingerprint\n", 728 | "- go beyond basics for 3D pharmacophore analysis\n", 729 | "- get the pharmacophore embedding code to v1\n", 730 | "- implementation of one-or-more shape-based fingerprints\n", 731 | "- shape-based alignment\n", 732 | "- other alignment-free 3D similarity approaches\n", 733 | "- generation of molecular surfaces\n", 734 | "- molecular-interaction fields (to allow 3D QSAR)\n", 735 | "- template-guided embedding in a protein pocket" 736 | ] 737 | }, 738 | { 739 | "cell_type": "markdown", 740 | "metadata": { 741 | "slideshow": { 742 | "slide_type": "slide" 743 | } 744 | }, 745 | "source": [ 746 | "# Thanks!\n" 747 | ] 748 | } 749 | ], 750 | "metadata": { 751 | "anaconda-cloud": {}, 752 | "celltoolbar": "Slideshow", 753 | "hide_input": false, 754 | "kernelspec": { 755 | "display_name": "Python [default]", 756 | "language": "python", 757 | "name": "python3" 758 | }, 759 | "language_info": { 760 | "codemirror_mode": { 761 | "name": "ipython", 762 | "version": 3 763 | }, 764 | "file_extension": ".py", 765 | "mimetype": "text/x-python", 766 | "name": "python", 767 | "nbconvert_exporter": "python", 768 | "pygments_lexer": "ipython3", 769 | "version": "3.5.1" 770 | } 771 | }, 772 | "nbformat": 4, 773 | "nbformat_minor": 0 774 | } 775 | -------------------------------------------------------------------------------- /Notebooks/Stiefl_RDKitPh4FullPublication.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 3D pharmacophores in the rdkit" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "The RDKit features functionality to align molecules to pharmacophoric features in 3D. This can be used for a simple alignment as well as for a fast pharmacophore searching. Pharmcacophore features are based on so-called feature factories ( a sample file \"BaseFeatures.fdef\" can be found in the Data directory ). \n", 15 | "\n", 16 | "The method is a three step procedure:\n", 17 | "1. conversion of the 3D pharmacophore to a boundary matrix\n", 18 | "2. matching of the pharmacophore to the boundary matrix of the target molecule using triangle smoothing\n", 19 | "3. embedding of the molecule to the pharmacophore using distance geometry (UFF force field)\n", 20 | "\n", 21 | "Here, we provide some guidance on how to use the 3D pharmacophore tools in the RDKit based on an example. We will create a pharmacophore from an ALK5 kinase crystal structure (PDB:1PY5) and will show how to align a sample structures extracted from the crystal structure." 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "In the crystal structure, the main interaction patterns that can be identified are the hydrogen bonds between the ligand and H283 as well as the water W5 in the backpocket. In both cases the ligand interacts as an H-bond acceptor. In addition, there is a weak H-bond donor functionality interacting with D351. Apart from the directional interactions the ligand has aromatic interactions with its chinoline as well as the backpocket pyridine.\n", 29 | "\n", 30 | "![ALK5 1PY5 binding mode](images/kinaseOverview.png)\n", 31 | "\n", 32 | "Based on that information we can now start setting up a pharmacophore (ph4)" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 1, 38 | "metadata": { 39 | "collapsed": false 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "from __future__ import print_function\n", 44 | "from rdkit import RDConfig, Chem, Geometry, DistanceGeometry\n", 45 | "from rdkit.Chem import ChemicalFeatures, rdDistGeom, Draw, rdMolTransforms\n", 46 | "from rdkit.Chem.Pharm3D import Pharmacophore, EmbedLib\n", 47 | "from rdkit.Chem.Draw import IPythonConsole, DrawingOptions\n", 48 | "from rdkit.Numerics import rdAlignment" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### Setting up a pharmacophore" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "In the RDkit a pharmacophore is defined by an object with a set of free chemical features (like acceptors, donors etc.) and a boundary matrix that describes the location of the features with respect to each other. One can define such a pharmacophore object manually." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 2, 68 | "metadata": { 69 | "collapsed": false 70 | }, 71 | "outputs": [], 72 | "source": [ 73 | "import os.path\n", 74 | "fdef = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')\n", 75 | "featFactory = ChemicalFeatures.BuildFeatureFactory(fdef)\n", 76 | "samplePh4Feats = [ChemicalFeatures.FreeChemicalFeature('Acceptor', Geometry.Point3D(3.877, 7.014, 1.448)),\n", 77 | " ChemicalFeatures.FreeChemicalFeature('Acceptor', Geometry.Point3D(7.220, 11.077, 5.625)),\n", 78 | " ChemicalFeatures.FreeChemicalFeature('Acceptor', Geometry.Point3D(4.778, 8.432, 7.805))]\n", 79 | "samplePcophore = Pharmacophore.Pharmacophore(samplePh4Feats)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 3, 85 | "metadata": { 86 | "collapsed": false 87 | }, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "array([[ 0. , 6.71795706, 6.57525467],\n", 93 | " [ 6.71795706, 0. , 4.20853763],\n", 94 | " [ 6.57525467, 4.20853763, 0. ]])" 95 | ] 96 | }, 97 | "execution_count": 3, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "samplePcophore._boundsMat" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "When applied in drug discovery projects, pharmacophore features are usually described as fuzzy locations in space. To describe this kind of fuzziness it is possible to manually change the lower and upper bounds of the respective points in the pharmacophore boundary matrix. " 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 4, 116 | "metadata": { 117 | "collapsed": false 118 | }, 119 | "outputs": [], 120 | "source": [ 121 | "samplePcophore.setLowerBound(0,1,6.0)\n", 122 | "samplePcophore.setUpperBound(0,1,7.0)\n", 123 | "samplePcophore.setLowerBound(0,2,6.0)\n", 124 | "samplePcophore.setUpperBound(0,2,7.0)\n", 125 | "samplePcophore.setLowerBound(1,2,3.5)\n", 126 | "samplePcophore.setUpperBound(1,2,5.0)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 5, 132 | "metadata": { 133 | "collapsed": false 134 | }, 135 | "outputs": [ 136 | { 137 | "data": { 138 | "text/plain": [ 139 | "array([[ 0. , 7. , 7. ],\n", 140 | " [ 6. , 0. , 5. ],\n", 141 | " [ 6. , 3.5, 0. ]])" 142 | ] 143 | }, 144 | "execution_count": 5, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "samplePcophore._boundsMat" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "However, mostly end users will describe a three dimensional pharmacophore using a graphical interface and then write out the details to a flat file. Here, we will show an application of this using a pharmacophore (based on the above) created using MOE." 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 6, 163 | "metadata": { 164 | "collapsed": false 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "moePh43Point = \"\"\"#moe:ph4que 2014.09\n", 169 | "#pharmacophore 7 tag t value *\n", 170 | "scheme t Unified matchsize i 0 use_Hs i 1 abspos i 0 title t $ useRval i 0 comment s $\n", 171 | "#feature 3 expr tt color ix x r y r z r r r ebits ix gbits ix m ix\n", 172 | "Acc df2f2 3.877 7.014 1.448 0.3 0 400 a64cff \n", 173 | "Acc df2f2 7.22 11.077 5.625 0.3 0 400 a64cff \n", 174 | "Don f20df2 4.778 8.432 7.805 0.3 0 400 a64cff\n", 175 | "#endpharmacophore\"\"\"" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 7, 181 | "metadata": { 182 | "collapsed": false 183 | }, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "['#moe:ph4que 2014.09',\n", 189 | " '#pharmacophore 7 tag t value *',\n", 190 | " 'scheme t Unified matchsize i 0 use_Hs i 1 abspos i 0 title t $ useRval i 0 comment s $',\n", 191 | " '#feature 3 expr tt color ix x r y r z r r r ebits ix gbits ix m ix',\n", 192 | " 'Acc df2f2 3.877 7.014 1.448 0.3 0 400 a64cff ',\n", 193 | " 'Acc df2f2 7.22 11.077 5.625 0.3 0 400 a64cff ',\n", 194 | " 'Don f20df2 4.778 8.432 7.805 0.3 0 400 a64cff',\n", 195 | " '#endpharmacophore']" 196 | ] 197 | }, 198 | "execution_count": 7, 199 | "metadata": {}, 200 | "output_type": "execute_result" 201 | } 202 | ], 203 | "source": [ 204 | "moePh43Point.split(\"\\n\")" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "First we need to parse the MOE ph4 content and then convert it into something useful for the RDKit (a pharmacophore object)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 8, 217 | "metadata": { 218 | "collapsed": false 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "#########################\n", 223 | "# parse the ph4 content #\n", 224 | "#########################\n", 225 | "def parsePh4Content(moePh4List):\n", 226 | " ph4Info = {}\n", 227 | " header = []\n", 228 | " content = []\n", 229 | " for line in moePh4List.split(\"\\n\"):\n", 230 | " if line[0] == \"#\":\n", 231 | " if len(header) > 0:\n", 232 | " ph4Info[header[0]] = (header,content)\n", 233 | " header = []\n", 234 | " content = []\n", 235 | " header = line.strip().split()\n", 236 | " else:\n", 237 | " content.extend(line.strip().split())\n", 238 | " return ph4Info\n", 239 | "\n", 240 | "############################ \n", 241 | "# convert the feature info #\n", 242 | "############################\n", 243 | "def convertFeatureInfo(ph4InfoDict):\n", 244 | " feats = []\n", 245 | " radii=[]\n", 246 | " # this is the conversion of the moe ph4 nomenclature to the rdkit one\n", 247 | " moePh4Conv = {\"Acc\":\"Acceptor\",\"Don\":\"Donor\",\"Aro\":\"Aromatic\",\"Hyd\":\"Hydrophobe\"} \n", 248 | " header,content = ph4InfoDict['#feature']\n", 249 | " noFeats = int(header[1])\n", 250 | " lenSingleFeature = int((len(header)/2)-1)\n", 251 | " # loop over ph4 features and populate pharmacophore object\n", 252 | " for i in range(noFeats):\n", 253 | " featFamily = content[(lenSingleFeature*i)]\n", 254 | " (x,y,z) = (float(content[(lenSingleFeature*i)+2]),float(content[(lenSingleFeature*i)+3]),\n", 255 | " float(content[(lenSingleFeature*i)+4]))\n", 256 | " r = float(content[(lenSingleFeature*i)+5])\n", 257 | " feats.append(ChemicalFeatures.FreeChemicalFeature(moePh4Conv[featFamily],Geometry.Point3D(x,y,z)))\n", 258 | " radii.append(r)\n", 259 | " return (feats,radii)\n" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 9, 265 | "metadata": { 266 | "collapsed": true 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "############################\n", 271 | "# create the pharmacophore #\n", 272 | "############################\n", 273 | "ph4Info = parsePh4Content(moePh43Point)\n", 274 | "feats,radii = convertFeatureInfo(ph4Info)\n", 275 | "pcophore = Pharmacophore.Pharmacophore(feats)" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "Now that we have the pharmacophore object created we need to modfy the bounds matrix to incorporate the radii of the various ph4 features. For this, we set the respective lower and upper boundaries in the bounds matrix of the pharmacophore object. For this we calculate the distance between the features and subtract (lower bounds) or add (upper bounds) the summ of the respective feature radii (see image).\n", 283 | "\n", 284 | "![MOE PH4 distance example](images/1py5Ph4DistancesOrig.png)\n", 285 | "\n", 286 | "For example, the distance between F1:Acc and F2:Acc is 6.72 Angstrom and the corresponding feature radii are 1.0 Angstrom each. So, in order to include the fuzziness of the pharmacophores the lower bounds are then set to max(featureDistance-(F1Radii+F2Radii),0). The max() is here applied in order to avoid having negative bounds values. The same is done for the upper bounds only that in this case the same of the radii is added to the distance\n", 287 | "\n", 288 | "![Fuzzy boundary matrix](images/1py5Ph4DistanceExample.png)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 10, 294 | "metadata": { 295 | "collapsed": false 296 | }, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/plain": [ 301 | "array([[ 0. , 6.71795706, 6.57525467],\n", 302 | " [ 6.71795706, 0. , 4.20853763],\n", 303 | " [ 6.57525467, 4.20853763, 0. ]])" 304 | ] 305 | }, 306 | "execution_count": 10, 307 | "metadata": {}, 308 | "output_type": "execute_result" 309 | } 310 | ], 311 | "source": [ 312 | "pcophore._boundsMat" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 11, 318 | "metadata": { 319 | "collapsed": false 320 | }, 321 | "outputs": [], 322 | "source": [ 323 | "def applyRadiiToBounds(radii,pcophore):\n", 324 | " for i in range(len(radii)):\n", 325 | " for j in range(i+1,len(radii)):\n", 326 | " sumRadii = radii[i]+radii[j]\n", 327 | " pcophore.setLowerBound(i,j,max(pcophore.getLowerBound(i,j)-sumRadii,0))\n", 328 | " pcophore.setUpperBound(i,j,pcophore.getUpperBound(i,j)+sumRadii)\n", 329 | "applyRadiiToBounds(radii,pcophore)" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 12, 335 | "metadata": { 336 | "collapsed": false 337 | }, 338 | "outputs": [ 339 | { 340 | "data": { 341 | "text/plain": [ 342 | "array([[ 0. , 7.31795706, 7.17525467],\n", 343 | " [ 6.11795706, 0. , 4.80853763],\n", 344 | " [ 5.97525467, 3.60853763, 0. ]])" 345 | ] 346 | }, 347 | "execution_count": 12, 348 | "metadata": {}, 349 | "output_type": "execute_result" 350 | } 351 | ], 352 | "source": [ 353 | "pcophore._boundsMat" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "### Checking a molecule against a pharmacophore" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "The RDKit provides a simple method to check a pharmacophore against a molecule. The idea of this is to filter out quickly molecules for which an embedding is not possible. Essentially, the molecule is analyzed for its pharmacophoric features and those are checked against the features of the pharmacophore. The output is a boolean indicating whether or not the molecule can fulfill the pharmacophore purely based on its features as well as a list of all possible mappings of the pharmacophore to the molecule." 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 13, 373 | "metadata": { 374 | "collapsed": false 375 | }, 376 | "outputs": [], 377 | "source": [ 378 | "ligand = Chem.MolFromSmiles(\"c1ccc(-c2n[nH]cc2-c2ccnc3ccccc23)nc1\")" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 14, 384 | "metadata": { 385 | "collapsed": false 386 | }, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAYAAABNcIgQAAAlP0lEQVR4nO3dfVzN5/8H8Nc50qlE\nytK6IZFQzJKwGgu5GbFls5uv2+wxY2PE3H8JM6ORe3Oz2RjfjP0Qa2yV21ES+pIpFF/dOkN3dH/O\n+/fHWUenG051zvmcOu/n43Eecp2b63WS8+66Ptfnc4mIiMAYY4wZKLHQARhjjDEhcSFkjDFm0LgQ\nMsYYM2hcCBljjBk0LoSMMcYMGhdCxhhjBo0LIWOMMYPGhZAxxphB40LIGGPMoHEhZIwxZtC4EDLG\nGDNoXAgZY4wZNC6EjDHGDBoXQsY0SC4H1qwBOnYEmjQBRCKhEzHGXoQLIWMatHEjsH07EBoKlJYC\nvMkZY/pPxPsRMqY5nTsDX38N+PsLnYQxpi4eETKmQffuAZGRgLk54OICnDghdCLG2ItwIWRMg1q1\nAoYNAx49AtatAz76SOhEjLEX4ULIWB3l5wMLFwJLlz5rGzpU8Wf5Ihkx/w9jTO/xMULGakkuB378\nEVi0CMjKAmxtgZQUwMQEkEqBiROBs2eBtm0Vi2c8PYG7d4FXXxU4OGOsWlwIGauF8HBg9mwgKQmQ\nSID584EvvlAcE6yOXA4MHAjExQGHDgGDBuk2L2PsxXjihjE13LihKGJ+fooiOHo0kJCgmBatqQgC\niqnRzz8HysqAN98EvvtOZ5EZY2riQsjYczx8CHzyCeDurlgN2r07cPIkcOAA4Oys3mv4+wO//QY0\nawZMngysXq3dzIyx2uGpUcaqUVYGbNkCLF8OPH4M2NgAwcHA2LF1XwBz+bJiRalUqhglrlvHi2kY\n0wdcCBmrpLbHAWsjJQUYPBhITlYU1V27gKZN6/+6jLG6499HGftHQkLdjgPWRvv2QFQU0KkT8Mcf\nWfjkk89RVFSkmRdnjNUJF0Jm8P7+W3EcsEePuh8HrA1HR+DPPwFv74X44YdNePPNN5GXl6f5jhhj\nauFCyAyKSKR6A4CtW4EdOxSjvnXrgEuXgP79tZvjpZeA0NBv8c477+D06dPo06cPUlNTtdspY6xa\nfIyQGRSRqOqOELm5wIoVimOBrVrpNo9MJsPUqVOxc+dO2NnZ4cSJE+jWrZtuQzBm4LgQMoMiEgEt\nWypWa/r4ABs2AA4OwmYiIixbtgzLli2DlZUVjh07Bi8vL2FDMWZAeGqUGRQiIDsbuHkTaNcOmDBB\n6ESASCTC0qVLsWLFCmRnZ2PYsGGIjo4WOhZjBoNHhMxg5eUB9vaKi2fri3379iE4OBgnT55EK13P\n0zJmoLgQMoOUk6M4Qf7PPxUXyNYnxcXFkEgkABSjxYpatWqFhw8fChGLsUbLSOgAjOlSeV2xsAC8\nvYHdu4XNU53yIggojh+Wi46ORnh4uBCRGGvUeETIDFZJCWBsLHQK9fn5+WHnzp2wtbUVOgpjjQov\nlmEGRy4HXF0VJ85XlJ0NhIUB168Lk+t5zp8/DysrKy6CjGkBF0JmcMRixSkUSUmK4lfuzz+Bt98G\n9uwRKlnNgoKCEBgYKHQMxholLoTMIPXsqTiV4vLlZ22enoo/L10SJlNNzp49i7KyMri7uwsdhbFG\niQshM0jVFb2XX1acTnH5MiCT6S6L7AWdBQUFYebMmboJw5gB4kLIDFJNoz9PT+DJE8W0qS48fvwY\njo6OWLVqlUp7amoqkpOTAQCnTp3C22+/rZtAjBkgLoTMILm4AC1aVF8IAd1Nj27evBnp6emQy+Uq\n7V9++SVcXFxw8uRJ3QRhzIDx6RPMYPXvD5w+DaSnA3Z2iraICMXGudOmAZs2abf/wsJCtGvXDgUF\nBbh//z4sLS0BAJmZmXBycoKNjQ2Sk5NhZMSn+zKmTTwiZAarfPQXF/esrUcPQr9+15GT85PW+9+7\ndy+kUikCAgKURRAAtmzZguLiYkyfPp2LIGM6wCNCZrAOHcrHhg338dZbDzBr1gBle8eOHZGWlobc\n3FwYa+mMe7lcDldXV9y5cwe3b9+Gk5MTAODJkydwdHSETCZDamoqmjdvrpX+GWPP8IiQGSx394c4\ne7YrIiK+UWn39PREUVEREhIStNZ3eHg4kpKS4O/vryyCALB79248fvwYH330ERdBxnSECyEzWE5O\nTrC2tkZcxblRAD179gQAXNLiipm1a9cCAObOnatsk8lkWLt2LZo2bconzzOmQ1wImUHz8PDAw4cP\ncffuXWWb5z8HDysXSE25ePEizpw5A29vb2VfABAWFoa7d+/i3XffhYPQuwUzZkC4EDKDVl6IKo7+\nPDw8YGRkpLUR4fr16wEAs2fPVmkvHyVWbmeMaRcXQmbQyqdBK47+zMzM0LlzZyQkJODp06ca7S8l\nJQUHDx5Ex44d8dZbbynbo6OjceHCBfj4+MDDw0OjfTLGno/XZjOD1qtXLwBVjweuWrUKzZs31/iq\n0c2bN0Mmk2HmzJkQi5/9HhoSEgKAR4OMCYFPn2AGz8HBAXl5ecjJyVEpTpqWk5ODtm3bQiKR4H//\n+x/MzMwAAMnJyejUqRNcXFxw48aNKrvSM8a0i0eEzOB5eHjg2LFjuHv3Ljp06KC1fpo1a4atW7fi\n6dOnyiIIABs2bFCOErkIMqZ7PCJkBi8lJQUtW7aElZWVzvt+9OgR2rZtixYtWuDevXuQSCQ6z8CY\noePFMszgtW/fXlkEly5dqtNR2Y4dO1BQUIApU6ZwEWRMIDwiZOwfcXFxGDlyJDIzM6Gr/xZubm5I\nSUnBvXv3YGNjo5M+GWOqeETIGICioiJMmjQJ+/bt02m/0dHROHLkCBdBxgTEI0LGAMycOROOjo4I\nDAyESCQCESEsLAy5ublo06YNHBwc0KZNG5iYmAgdlTGmYVwIGQMgFourTIf6+Pjg9OnTKm0mJiaw\ns7ODra0t7Ozs0L59e7Rv3175d2dnZ1hYWDy3r19//RVz5szB3bt34eTkhG+++QZ+fn6afks6U/GY\nasXvYeVjra1atcLDhw+10pdcLkdISAi2b9+OlJQUyOVynU1vs4aPCyFjlZSPCH/77TckJibi/v37\nSE1NRVpaGlJTU5GVlVXjh6xYLMbLL7+Mtm3bwsHBAQ4ODnB0dETXrl3h6+sLALC2tsaePXswYMAA\nREZGIiAgAFKpVJdvUSvKv2/ViY6ORnh4OFasWKGVvtavX48tW7YgNDQUPXr00Or5oKzx4fMIGavB\nsGHDMGzYsGrvy87ORkpKCjIyMpCZmVnl67i4OMTExCgfP2rUKGUhtLOzA/BsdGNvb6/ldyK8r776\nCjt37sTevXuRnp5eq+fa2Nhg4sSJz33Mtm3bEBwcrLxkHmO1wYWQGaTU1FTY2tpWuwO8OpMklpaW\n8PDwqPG6oIWFhbh//z7S0tKQlpamshhmx44dGDJkCHJzc2FhYYHff/+97m+kATh//jysrKxga2uL\nb7/9FhcuXKjV83v06PHCQnjv3j1ERkZi3LhxsLOzw8aNGzF06NB6pGaGhKdGmcFJT0+Ht7c3evTo\ngQMHDlRbDLXJxcUFGzZsUE6Nzpo1C0lJSTrNoA01TY36+vrim2++gbu7Oy5duoTc3Nxqn5+dnV1t\nu6WlpXI0XVNf9vb22LFjB3x9fREZGYnJkyfXeuTJDBePCJlBkUql8PHxwf/+9z+MHj1a50UQgLIQ\nlE+N1lQYGoOzZ8+irKwM7u7uAKCy/6ImlY/+yr+nfIyQ1QoxZiDy8/PJ09OTAFBAQADJ5XJBchw5\ncoRcXFxIIpGQi4sLHT16VJAcmgKgyq2cj48PHT58WOt9PXjwgN58801q1qwZdenShSIiIjTWJ2v8\neGqUGYTi4mIMHz4cUVFRGD58OA4fPoymTZsKHQsAcPXqVRgbG8PNzU3oKHVSfqyTsYaK5w9YoyeX\nyxEQEICoqCi8/vrrOHjwoN4UwcuXL8PT0xOfffaZ0FHqzN/fHwMHDkROTo5gGXJycjB58mTMnDlT\nsAys4eJCyBq9zz//HKGhoejevTt+/fVXmJqaCh1JycPDA/3798eZM2fw888/Cx2n1kJDQ3Hq1CkQ\nEVq2bClYDjMzM5w+fRqbN2/GtWvXBMvBGiaeGmWN2jfffIO5c+eiXbt2OH/+vPIcPn0SHx+Pnj17\nws7ODomJiSp7Feqzp0+folOnTpBKpYiPj4erq6ugeQ4dOoR33nkH/fv3x8mTJwXNwhoWHhEyjTp4\n8CDc3NxgZmYGT09PnD17VrAsP/zwA+bNm4fWrVsjIiJCL4sgALz66quYOHEiUlNTsW7dOqHjqG3V\nqlVIT0/HlClTVIqgTCYTJE/5RQtOnTqFo0ePCpKBNVCCLtVhjc7o0aPp+vXrVFhYSDt37iQ7OztB\ncoSHh1PTpk3J3NycYmNjBclQGw8ePKAWLVqQubk5paenCx3nhW7dukUSiYSsra3p8ePHyvbS0lLy\n8PCgOXPmaHVVbklJCa1evZomTZqk0h4fH09NmjShDh06UFFRkdb6Z40LF0KmNXfu3KEOHTrovN8/\n//yTTE1NSSKRUFRUlM77r6uVK1cqT+3Qd/7+/gSAtm3bptK+Zs0aAkAfffSRVvuXy+XUq1cvAkAn\nTpxQue+jjz4iALR27VqtZmCNBxdCphWZmZnk4eFBYWFhRER08+ZN+vbbbyk+Pp7Kysq01u9ff/1F\nrVq1IrFYTKGhoVrrRxsKCwupXbt2JBaL6dKlS0LHqdGJEycIAHl6epJMJlO2Z2RkUPPmzally5b0\n4MEDreeIjo4mkUhErq6uVFpaqmwvH123bNmS/v77b63nYA0fF0KmcRcvXqR27drR3r17lW2rV69W\nngBtZGRErq6uNHnyZNq9ezclJCRorO/AwEACQKtXr9bYa+rSzz//TADIy8tLsBP+n6ekpIS6dOlC\nIpGIzp8/r3JfQEAAAaCQkBCd5fnggw8IAG3dulWlvXx0/dlnn+ksC2u4uBAyjdqxYwfZ2NjQH3/8\nodJ+7do1WrlyJb311lv08ssvV7k6SIcOHWjMmDG0adMmiouLU/kNvzZkMplGr2QihL59+xIA+uWX\nX4SOUsWGDRsIAI0ZM0alPSYmhsRiMXXt2rXO/3Z1kZqaSmZmZmRlZUWPHj1SthcXF1OHDh2oSZMm\ndO3aNZ3lYQ0TF0KmUZULHADKz8+v8rjMzEw6evQoBQUFkZ+fH1lZWak8p7pRY+UR0v79+6lTp05k\nampKPXv2pNOnT+vqbWrVlStXSCwWk5OTExUWFgodR0kqlZKlpSWZm5tTWlqasl0mk5GHhwcBoMjI\nSJ3n+ve//00AKDAwUKW9fHQ9cOBAnWdiDQsXQlYvRUVFlJqaWu/XKSkpoYsXL9LGjRtpzJgx5Ozs\nXKWgtm3blj744APKysoiIqJx48bRvXv3qKCggPbt20e2trb1zqEvJkyYQABo1apVQkdRmjx5MgGg\nlStXqrT/8MMPBIDefvttQXLl5+eTnZ0dNW3alBITE1Xu69evHwGg8PBwQbKxhoELIauzsrIyGj16\nNNnZ2VFSUpLGXz8/P5/OnTtH69evp9GjR5ONjQ2ZmppSSUmJyuOKiorol19+oVdeeUXjGYSSnp5O\n5ubm1Lx5c8rIyBA6DsXGxpJYLCYXFxcqLi5Wtufk5JCNjQ2ZmJhQcnKyYPl27dpFAMjPz0+lvXx0\n3blz5yo/N4yV40LI6kQul9PYsWMJAPXt25cKCgpe+Jx58+bRqFGjKDg4mM6cOUNPnjypVZ8ymYzu\n3bun0lY+WrS0tKTo6OhavZ6+W758OQGgjz/+WNAccrmcvLy8CECVnTLmzJlDAGjRokUCpVOQyWTK\nnUWOHz+ucl/56HrDhg0CpWP6jgshq5PyD8Du3btTTk6OWs957bXXqhwHfPXVV2nKlCn0ww8/0F9/\n/VWnlZLlU6POzs61fq4+e/LkCdnb21OfPn10ugClsqtXr5KRkRH5+vqqtCcmJpKxsTE5OjrS06dP\nBUr3zPnz50kkElGXLl1URn/3798nMzMzsra2puzsbOECMr3FhZDVWnBwMAGgdu3a1foqKOnp6cpF\nMr6+vmRqaqpSHM3MzMjb25s+//xz2r1793On22bMmEFSqZQKCgooNDSU2rRpU9+3pndu376tcq6e\nUG7evEm3b99WaRs5ciQBoN27dwuUqqp3332XANDmzZtV2tevX0+7du3Si+8l0z9cCFmtfPfddyQS\nicjGxqbKB2Nd5Ofn06lTp2jlypU0YsQIat26dZVFMh07dqT79+9Xee73339PdnZ2ZG5uTl5eXnTh\nwoV659FnERER5OrqSubm5jR8+HB6+PChRl+/4ve8omPHjlHnzp1JIpFQ586d6dixY0SkGIFNnTpV\nr853vHXrFrVv3155IQfG1MG7TzC1HTp0CO+99x5MTU1x8uRJeHp6aqWfzMxMxMXF4fLlyzh//jyu\nX7+OjIwMiMU1XyOeiFBQUIBmzZppJZM+aNOmDbZt2wZfX19ERkYiLCwMO3bs0Hg/IpEIFT8WrK2t\nsWfPHgwYMACRkZEICAiAVCrVeL+aIpPJ0KRJE5U2kUik/Lrie6vYXvk+ZjiMhA7AGoazZ89i7Nix\nMDIyQlhYmNaKIADY2tpixIgRGDFiBACgtLT0uUXwypUrGDFiBN5//32EhIRoLZfQynd1KP/wPn78\nOLKzs5/7HHNz83pvQly+a0d5v/b29vV6PW2rXASBZwWucuGreB8zYMINRllDceXKFWrRooXeXr/z\n6dOnZGZmRg4ODno1Tadpv/76Kzk7O5OZmRnNmjWLTExMyNjYuNqLGKh7mzx5cpV+Kn8sxMTEkIWF\nBQEgCwsLiomJ0dVb1rjK7w0AtWzZkqysrGjUqFEaOSeWNTw8ImTPdevWLQwdOhR5eXnYsmULPvjg\nA6EjVWFmZoZBgwYhLCwMcXFxWh2tCmn48OEYPnw4AODIkSM4fvw4HB0dUVZWVuNznjx5gtLS0hrv\nd3R0fGG/48aNQ2hoqHJqdPz48UhKSqr9G9BD9M9oUCqVYvXq1ZgwYQKioqIETsV0jQshq1FWVhbe\nfPNNSKVSLFy4EJ9++qnQkWrk7++PsLAwHD58uNEWQgCQy+W4evUq5s6di+nTp2P69Ola7zM3NxfA\ns2nF8r83Jq1bt0ZQUJDeT/syLRF6SMr0l4+PDwGgcePG6f2U46NHj8jIyIhcXFyEjqI1AEgsFlOH\nDh1ozZo1Gj8VANVMnRIRHTlyhFxcXEgikZCLi0uVk+obkpo+8rKzs2nBggXUt29fHSdi+oBXjbIa\nnTt3Dlu3bsWePXvqveBCFwYOHIiTJ0/ixo0bcHV1FToO0yM1LZIpb7ewsIC3tzc2b94MJycnXcdj\nAqt5KR4zKCKRSHkr17dvX4SGhjaIIggopkcB4PDhwwIn0ZwVK1YgISFB6Bg4ffo07t69K3SMOsvI\nyIBcLgcpzp1WHhss/zonJwfh4eFcBA0UF0IGACofDg3VyJEjIRKJEBYWJnQUjYiKisLixYsxYcIE\nQf9tIiIi0L9/f3zxxReCZaiP0tJSDBw4ED4+PigqKhI6DtNDXAhZo9G2bVuMGXMemZkXkJYmdJr6\nKS0txbRp0yASibBp06Zqp/Z0xdfXF97e3jh06BD++OMPwXLU1bp163Dz5k20a9cOJiYmQsdheogL\nIWtU3NxeQ1qaEf7v/4ROUj+bNm1CYmIi/vWvf8HLy0vQLCKRCGvWrIFIJEJgYOBzT9fQNxkZGVix\nYgUsLCwQHBwsdBymp7gQskblnXcUfzbkw4SZmZlYunQpzM3NsXr1amU7EWHLli3Iy8vTeaY+ffrg\nww8/xF9//YXvvvtO5/3X1cKFC5Gfn4+FCxfCxsZG6DhMXwmzWJXpq8bwI+HmRtSkCZFUKnSSupk0\naVK1O8H/9NNPBIDGjh0rSK7U1NQGtZ1RdHQ0iUQi6tSpk8pmwoxVxiNCBgAqK0Yrrx5taPz9AZkM\nOHZM6CS1Fxsbix9//BEuLi6YPXu2sj03NxezZ8+GiYkJli9fLkg2BwcHzJ49G3///Te++uorQTKo\nSy6XY9q0aSAibNq0CcbGxkJHYnqMCyEDAJVl5dTAV5D+cxZFg5seJSLMmDEDcrkca9asUfnw/vLL\nLyGVSjFnzhxBl/jPnz8fbdq0wcaNG3H79m3BcrzI7t27cfnyZfj5+WHQoEFCx2H6TsDRKNND169f\np5kzZ9LFixdV2vfv30+TJ0/W+yvMlHNyIpJIiPLyhE6ivj179hAAGjZsmEr7zZs3qWnTpuTo6EgF\nBQUCpXvmxx9/JADk7+8vdJRq5eTkUOvWrUkikdCdO3eEjsMaAC6ETMXhw4cJAE2bNk2l3d/fnwBQ\nbGysQMlqJzCQCCA6cEDoJOrJzc0lGxsbkkgkVTY89vX1JQB6s/OHTCYjT09PAkARERFCx6niiy++\nIAA0f/58oaOwBoILIVNRUFBAzZo1I3t7e5XRX/loZcGCBQKmU19cHNGXXxKlpAidRD3z5s0jADR3\n7lyV9rCwMAJAPj4+AiWr3oULF0gkElH37t2prKxM6DhKiYmJZGxsTPb29pSfny90HNZAcCFkVYwa\nNYoAqOw7l52dTcbGxtS5c2cBkzVOSUlJZGxsTHZ2dpRXYS63sLCQ2rdvT0ZGRpSQkCBgwuq99957\nBIB27twpdBSlQYMGEQD66aefhI7CGhBeLMOqqO6anS1btsQbb7yBxMRE3Lx5U6hoahGJnt0qiowE\n3NyA5s0BPz/g0SNh8lU2e/ZslJSU4Ouvv0bz5s2V7evWrUNKSgo+/vhjuLm5CZiweqtXr4aJiQkW\nLVqkF1szhYeHIyIiAv369cOYMWOEjsMaEN59glWRk5MDGxsbODo64tatW8r2b7/9Fp9++im++uor\nLFy4UMCE6hGJgIo/3W3aANu2Ab6+iqIYFgbs2CFcPgD47bffMHz4cPTt2xdnzpxRnraSnp6Ozp07\nw8TEBElJSbCyshI2aA0WLlyIr7/+GvPnz8fXX3+t1nPCwsKQmJio1mMnTJiAl19++YWPKy4uhpub\nG+7du4dLly7B3d1drddnDACvGmXVGzx4MAGg69evK9uysrJILBZTz549BUymvso/3ba2RL/+SlRc\nrPjTwUGYXBXduHGDhg4dStHR0SrtY8aMIQC0efNmgZKpJycnh6ytrWnEiBFqrygun1JV5xYXF6fW\na54+fZqMjY0pICCgPm+HGSgeEbJqbdu2DVOnTsXy5cuxePFiZbuXlxdiYmJw7949tG3bVsCEL1Z5\nRBgeDsycCWRkAFOmAFu3AoWFgsWrUWFhIfr164eCggLEx8fr/TZY6enptdrZPSYmBqmpqWo91tfX\nF5aWlmo9Njk5Gc2bN0fr1q3VzsIYAB4RsuqVj/7c3d1V2oODgwkAbdiwQaBk6nveT/fhw0Rduugs\nSpVd38tlZWXRoEGDqHnz5jRo0CDKysoiIsUpChkZGboLWA81vbf9+/dTp06dyNTUlHr27EmnT5/W\nWl/lgoKCGsVlAplu8WIZVi0bGxv06dMHV69eRUpKirJdXza/jYoCxowBiotr9zy5HLh8GZg7F5g6\nVTvZqkM1XK1n/vz56NatG1JTU9G1a1csWLAAACAWi2Fra6u7gPVQ03sLDw/H77//jkePHiEwMBAf\nfvih1voCgLi4OOwQ+qAva5CMhA7A9Je/vz8uXLiAI0eOYNasWQAAZ2dndOvWDefOncPff/8Na2tr\nnee6cgUYNUoxrTl1KvD666r3V1wtWv41keJrsRhwclI877PPdJe5Jn/88QdiY2NhYWGB2bNno3fv\n3kJH0pg9e/YAUCxkkUgksLa2RnZ2dq1fx8TEBKamps99TFFRESZNmoR9+/ZhwIABdcrLDJiQw1Gm\n31JSUggAvf766yrtS5YsIQD0/fff6zzTX38RtWpFJBYT7d+vel9hIVFams4j1Url/3LGxsZUUlJC\nREQlJSVkbGwsRCyNqO7jBP9MY1paWtLJkyfVXiRT8VbdRRwq9zVjxgwKCQmpMQdjz8OLZdhzde/e\nHQkJCUhLS1NO1cXHx8Pd3R1+fn44psMtHu7fB7y9gbQ0xUKXilObMhnw3ntAbCxw8iTQsaPOYtWK\nSCRSmdqzt7dHbGws7O3tkZ6ejt69eyMtLU3AhHVX+b2VKywsxOHDh7FkyZI6XTD8ww8/xKRJk57b\nl1gsrtI3f7QxdfHUKHsuf39/XLt2DUePHsUnn3wCQFEc33//ffj4+OgsR15eHubN+wZS6WLMmmWs\nUgTlcmDCBODQIaBfP8DBQWex6m3QoEEICQnBkiVLEBIS0qh2Spg5cyYWLVoEc3NziMVilJSUICIi\nQit9yeVy5dc1FWTGaiTcYJQ1BPHx8QSAhgwZIliGwsJCeuONNwgABQYGU+XT1aZNU6wQdXcnys0V\nJuOLoJopPyKizMxMGjhwIJmbm9PAgQMpMzNT4KS1V9N7+/7778nOzo7Mzc3Jy8uLLly4oLW+Kj+G\nsdrgqVH2Qs7Ozrh//z4ePHig9jldmlJcXAw/Pz9ERkbi3Xffxf79+9GkSRPl/cHBwLx5gIsLcO4c\n0BBPIZPJZCrviTGmW3z6BHuhESNGoHv37njw4IFO+5XL5QgICEBkZCQGDx6Mffv2qRSM/fuLsWAB\n8NJLisulNcQieOLECXTq1AkXL14UOkqdEBE+++wzxMfHCx2FsTrjESF7obKyMhgZ6f5wckJCAnr1\n6gVnZ2ecOXNGZTR66NAhfPLJFHTseATbt3uhWzedx9OIU6dOYcCAAXB3d0dcXBzE4ob1u+nu3bsx\nceJEDB06FMePHxc6DmN10rD+1zFBlBfBBw8eYPDgwWjRogUGDx6s9RFi165dERUVhRMnTqgUwZiY\nGIwfPx55eblYuvRJgy2CANC/f3+8/fbbuHr1Kvbu3St0nFrJycnB3LlzIZFIsHnzZqHjMFZnXAiZ\n2mq6Coo2vfbaa7Czs1P+/erVqxgyZAiKi4tx8OBBDB48WOsZtG3t2rWQSCRYuHAhnj59KnQctS1b\ntgxSqRQzZsxAhw4dhI7DWN0JulSHNSh2dnaU9s8Z62lpaWRvb6/T/m/dukWtW7cmkUhEu3fv1mnf\n2jZ79mwCQEFBQUJHUcu1a9fIyMioymbCjDVEfIyQqU0ikeDJkydo2rQpSktLYW5ujtjYWHTu3BkS\niUSrfWdlZeH1119HcnJyg9kPsTby8vLg4uKCvLw83Lx5E46OjkJHei5fX19ERUXhxx9/xIQJE4SO\nw1i98NQoU9tLL70EqVQKAJBKpXjppZfQo0cPNGvWDG5ubhg/fjw2bNiAy5cvq5zgrAlisRgtW7bE\nlClTGl0RBIAWLVogKCgIhYWFWLRokdBxnuvIkSOIiopCnz59MH78eKHjMFZvPCJkaps4cSJatWqF\nJUuWYPny5UhPT4elpSViYmKQkJCAsrIy5WOtra3Ru3dv9O7dG3369EGvXr3QokWLevX/9OlTmJiY\nNNpz7mQyGdzd3ZGQkIBz587B29tb6EhVlJQAfn7/RWbmVGzfvgZeXl5CR2Ks/oSdmWUNyfOuglJa\nWkoJCQm0fft2GjduHLm6upJIJFK5AoitrS35+fnRqlWr6Ny5c1RYWKh8/oEDB8jV1VW5d92ZM2eI\nqOb9+hqrqKgoAkB9+vRRe8d3XQoOVlzFZ9w4oZMwpjk8ImRak5qaipiYGMTExODixYu4cuUKCits\nCW9lZYWsrCw0bdoU7733HpYsWQJnZ2fs3bsXQUFBSE9PR0BAAKysrLBkyRIsW7YMOTk52LVrl4Dv\nSvv8/PwQHh6O//znPxrZw09T7t8HunQBjI2BpKSGeQEDxqrDhZDpTGlpKf773//i4sWLiImJQXFx\nMQ4cOFDlccnJyRgyZAju3LnTqHZnUFdSUhK6desGGxsbJCUlwczMTOhIABQbIf/nP8C6dcDMmUKn\nYUxzuBAyvZKVlQU/Pz8sWbIEI0eOrHalanFtt6VvgGbMmIGNGzdi+fLlWLx4sdBxcPYs4OMDdO2q\n2BhZgAsNMaY1vGqU6Y3Y2Fi89tprCAwMxMiRIwFUXalqbW0tZESdWbp0KVq1aoXg4GBkZGQImkUm\nU4wAiYD167kIssaHCyHTCzt37sTIkSOxY8cOjBkzRtlevl9fbm5uo9uv73ksLS2xYMECPHnyBMuX\nL1f7eTY2NhCJRCq3Pn0yIBJB5TZnjvpZdu0Crl4F3n0XGDCgDm+GMT3Hv9sxvTB58mQAULlkWn5+\nPlatWoWxY8fCwcEBvXv3bnDX46yP6dOnIyMjAzNmzFD7Oa+88gqys7NV2hwdZahwZguA2m1e3Lcv\n8M47wOrV6j+HsYaEjxEyxhgzaDw1ypieqji9qU57/fp6dqvo4EHAzQ0wMwM8PRWLZhhrbLgQMqan\niAjVTdjU1F6/vhS3yg4eBH7+GXj8GPjkE0CPTmtkTGN4apQxPScSiaotfBXbb9y4gWXLllX7fCen\nEOTkVD0oaG8PLFlS+TWrL4gAkJwMDBkC3LlTu/yM6TteLMNYIyCVSnHw4MFq7+vZcyvi4qq2d+tW\ntRDWJCsLeP99ICSkHiEZ01M8ImRMz6kzIiwsLERmZma1z3/yxBalpaZV2k1NAVfXyq9ZdUQYG6so\ngitWKK4uw1hjw4WQMT2nTiHUXF+qhXDnTmDxYuCnnwADOYWTGSAuhIzpqepWhRJRje3166tqG1H1\n7fn5gLl5vbpjTK9wIWSMMWbQ+PQJxhhjBo0LIWOMMYPGhZAxxphB40LIGGPMoHEhZIwxZtC4EDLG\nGDNoXAgZY4wZNC6EjDHGDBoXQsYYYwaNCyFjjDGDxoWQMcaYQeNCyBhjzKBxIWSMMWbQ/h+P4LXj\nHrl5mgAAAABJRU5ErkJggg==\n", 391 | "text/plain": [ 392 | "" 393 | ] 394 | }, 395 | "execution_count": 14, 396 | "metadata": {}, 397 | "output_type": "execute_result" 398 | } 399 | ], 400 | "source": [ 401 | "DrawingOptions.bondLineWidth=1.8\n", 402 | "DrawingOptions.atomLabelFontSize=14\n", 403 | "DrawingOptions.includeAtomNumbers=True\n", 404 | "ligand" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 15, 410 | "metadata": { 411 | "collapsed": false 412 | }, 413 | "outputs": [], 414 | "source": [ 415 | "canMatch,allMatches = EmbedLib.MatchPharmacophoreToMol(ligand,featFactory,pcophore) " 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 16, 421 | "metadata": { 422 | "collapsed": false 423 | }, 424 | "outputs": [ 425 | { 426 | "data": { 427 | "text/plain": [ 428 | "True" 429 | ] 430 | }, 431 | "execution_count": 16, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "canMatch" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 17, 443 | "metadata": { 444 | "collapsed": false 445 | }, 446 | "outputs": [ 447 | { 448 | "name": "stdout", 449 | "output_type": "stream", 450 | "text": [ 451 | "0 Acceptor SingleAtomAcceptor (5,)\n", 452 | "0 Acceptor SingleAtomAcceptor (12,)\n", 453 | "0 Acceptor SingleAtomAcceptor (19,)\n", 454 | "1 Acceptor SingleAtomAcceptor (5,)\n", 455 | "1 Acceptor SingleAtomAcceptor (12,)\n", 456 | "1 Acceptor SingleAtomAcceptor (19,)\n", 457 | "2 Donor SingleAtomDonor (5,)\n", 458 | "2 Donor SingleAtomDonor (6,)\n" 459 | ] 460 | } 461 | ], 462 | "source": [ 463 | "for (i,match) in enumerate(allMatches):\n", 464 | " for f in match:\n", 465 | " print(\"%d %s %s %s\"%(i, f.GetFamily(), f.GetType(), f.GetAtomIds()))" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "### Matching a molecule to a pharmacophore" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "Since the RDKit pharmacophore module is based on distance geometry, a molecule can be matched to a pharmacophore without actually aligning it to the pharmacophore. For this, the molecules' (smoothed) bounds matrix is updated with the distances of the pharmacophore bounds at the respective atom matches. Then, the algorithm tries to smooth the modified bounds matrix. " 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": 18, 485 | "metadata": { 486 | "collapsed": false 487 | }, 488 | "outputs": [], 489 | "source": [ 490 | "boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(ligand)" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": 19, 496 | "metadata": { 497 | "collapsed": false 498 | }, 499 | "outputs": [], 500 | "source": [ 501 | "if canMatch:\n", 502 | " failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 503 | " pcophore,useDownsampling=False)" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "Apart from an indicator if the molecule could be matched to the pharmacophore (failed), a lot of useful information is returned. The molecules' smoothed bounds matrix (boundsMatMatched) as well as information of what atoms were matched to the pharmacophore (matched)." 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 20, 516 | "metadata": { 517 | "collapsed": false 518 | }, 519 | "outputs": [ 520 | { 521 | "data": { 522 | "text/plain": [ 523 | "0" 524 | ] 525 | }, 526 | "execution_count": 20, 527 | "metadata": {}, 528 | "output_type": "execute_result" 529 | } 530 | ], 531 | "source": [ 532 | "failed" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 21, 538 | "metadata": { 539 | "collapsed": false 540 | }, 541 | "outputs": [ 542 | { 543 | "data": { 544 | "text/plain": [ 545 | "(21, 21)" 546 | ] 547 | }, 548 | "execution_count": 21, 549 | "metadata": {}, 550 | "output_type": "execute_result" 551 | } 552 | ], 553 | "source": [ 554 | "boundsMatMatched.shape" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 22, 560 | "metadata": { 561 | "collapsed": false 562 | }, 563 | "outputs": [ 564 | { 565 | "name": "stdout", 566 | "output_type": "stream", 567 | "text": [ 568 | "Acceptor 12\n", 569 | "Acceptor 19\n", 570 | "Donor 5\n" 571 | ] 572 | } 573 | ], 574 | "source": [ 575 | "for match in matched:\n", 576 | " print(\"%s %d\"%(match.GetFamily(),match.GetAtomIds()[0]))" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "Computation times of bounds smoothing depend heavily on the size of the bounds matrix. To speed up things, the RDKit provides an argument to the MatchPharmacophore method (useDownsampling) which essentially reduces the bounds matrix to the relevant elements and then does the bounds smoothing on the reduced matrix. The matchDetails return value is a tuple that provides information on the size of the input bounds matrix as well as the output bounds matrix. If downsampling is not enabled both values are identical if not they differ where the second element is the reduced bounds matrix size." 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 23, 589 | "metadata": { 590 | "collapsed": false 591 | }, 592 | "outputs": [ 593 | { 594 | "data": { 595 | "text/plain": [ 596 | "(21, 21)" 597 | ] 598 | }, 599 | "execution_count": 23, 600 | "metadata": {}, 601 | "output_type": "execute_result" 602 | } 603 | ], 604 | "source": [ 605 | "matchDetails" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 24, 611 | "metadata": { 612 | "collapsed": false 613 | }, 614 | "outputs": [], 615 | "source": [ 616 | "failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 617 | " pcophore,useDownsampling=True)" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 25, 623 | "metadata": { 624 | "collapsed": false 625 | }, 626 | "outputs": [ 627 | { 628 | "data": { 629 | "text/plain": [ 630 | "0" 631 | ] 632 | }, 633 | "execution_count": 25, 634 | "metadata": {}, 635 | "output_type": "execute_result" 636 | } 637 | ], 638 | "source": [ 639 | "failed" 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": 26, 645 | "metadata": { 646 | "collapsed": false 647 | }, 648 | "outputs": [ 649 | { 650 | "data": { 651 | "text/plain": [ 652 | "(21, 13)" 653 | ] 654 | }, 655 | "execution_count": 26, 656 | "metadata": {}, 657 | "output_type": "execute_result" 658 | } 659 | ], 660 | "source": [ 661 | "matchDetails" 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": {}, 667 | "source": [ 668 | "### Embedding a molecule onto a pharmacophore ( NOT the actual alignment)" 669 | ] 670 | }, 671 | { 672 | "cell_type": "markdown", 673 | "metadata": {}, 674 | "source": [ 675 | "Now that we know that the molecule should actually embed onto the pharmacophore and how (using the list of matches returned from MatchPharmacophore), embedding the molecule (with the ph4 constraints) is actually quite straightforward." 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": 27, 681 | "metadata": { 682 | "collapsed": false 683 | }, 684 | "outputs": [], 685 | "source": [ 686 | "atomMatch = [list(x.GetAtomIds()) for x in matched]" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": 28, 692 | "metadata": { 693 | "collapsed": false 694 | }, 695 | "outputs": [ 696 | { 697 | "data": { 698 | "text/plain": [ 699 | "[[12], [19], [5]]" 700 | ] 701 | }, 702 | "execution_count": 28, 703 | "metadata": {}, 704 | "output_type": "execute_result" 705 | } 706 | ], 707 | "source": [ 708 | "atomMatch" 709 | ] 710 | }, 711 | { 712 | "cell_type": "markdown", 713 | "metadata": {}, 714 | "source": [ 715 | "To avoid really bad conformations one needs to add hydrogends at this point (it helps a bit)" 716 | ] 717 | }, 718 | { 719 | "cell_type": "code", 720 | "execution_count": 29, 721 | "metadata": { 722 | "collapsed": true 723 | }, 724 | "outputs": [], 725 | "source": [ 726 | "ligH = Chem.AddHs(ligand)" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 30, 732 | "metadata": { 733 | "collapsed": false 734 | }, 735 | "outputs": [], 736 | "source": [ 737 | "bm,embeddings,numFail = EmbedLib.EmbedPharmacophore(ligH,atomMatch,pcophore,count=10)" 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 31, 743 | "metadata": { 744 | "collapsed": false 745 | }, 746 | "outputs": [ 747 | { 748 | "data": { 749 | "text/plain": [ 750 | "0" 751 | ] 752 | }, 753 | "execution_count": 31, 754 | "metadata": {}, 755 | "output_type": "execute_result" 756 | } 757 | ], 758 | "source": [ 759 | "numFail" 760 | ] 761 | }, 762 | { 763 | "cell_type": "code", 764 | "execution_count": 32, 765 | "metadata": { 766 | "collapsed": false 767 | }, 768 | "outputs": [ 769 | { 770 | "data": { 771 | "text/plain": [ 772 | "10" 773 | ] 774 | }, 775 | "execution_count": 32, 776 | "metadata": {}, 777 | "output_type": "execute_result" 778 | } 779 | ], 780 | "source": [ 781 | "len(embeddings)" 782 | ] 783 | }, 784 | { 785 | "cell_type": "markdown", 786 | "metadata": {}, 787 | "source": [ 788 | "In this case none of the embeddings failed. Note - at this stage the molecule is just embedded onto the pharmacophore and not aligned to it. " 789 | ] 790 | }, 791 | { 792 | "cell_type": "markdown", 793 | "metadata": {}, 794 | "source": [ 795 | "### Aligning an embedding to a pharmacophore" 796 | ] 797 | }, 798 | { 799 | "cell_type": "markdown", 800 | "metadata": {}, 801 | "source": [ 802 | "In order to do the the alignment code provided by the RDKit can be used. First we start by generating the coordinates of the matching pharmacophoric points - since this can be multiple points per match we cannot just use the atomIds and do the alignment based on that. For this, we create the geometric mean of the multiple atoms coordinates and use that to align to the pharmacophore features. " 803 | ] 804 | }, 805 | { 806 | "cell_type": "code", 807 | "execution_count": 33, 808 | "metadata": { 809 | "collapsed": false 810 | }, 811 | "outputs": [ 812 | { 813 | "data": { 814 | "text/plain": [ 815 | "[0.3324263571321069,\n", 816 | " 0.33586698144527105,\n", 817 | " 0.32520986638642313,\n", 818 | " 0.32671130271598514,\n", 819 | " 0.3421387037292547,\n", 820 | " 0.3224447247298343,\n", 821 | " 0.3425788770183189,\n", 822 | " 0.3339505837944188,\n", 823 | " 0.31849689244806356,\n", 824 | " 0.3298604798363485]" 825 | ] 826 | }, 827 | "execution_count": 33, 828 | "metadata": {}, 829 | "output_type": "execute_result" 830 | } 831 | ], 832 | "source": [ 833 | "def GetTransformMatrix(alignRef,confEmbed,atomMatch):\n", 834 | " alignProbe = []\n", 835 | " for matchIds in atomMatch:\n", 836 | " dummyPoint = Geometry.Point3D(0.0,0.0,0.0)\n", 837 | " for id in matchIds:\n", 838 | " dummyPoint += confEmbed.GetAtomPosition(id)\n", 839 | " dummyPoint /= len(matchIds)\n", 840 | " alignProbe.append(dummyPoint)\n", 841 | " return (rdAlignment.GetAlignmentTransform(alignRef,alignProbe))\n", 842 | "\n", 843 | "def TransformEmbeddings(pcophore,embeddings,atomMatch):\n", 844 | " alignRef = [f.GetPos() for f in pcophore.getFeatures()]\n", 845 | " SSDs = []\n", 846 | " for embedding in embeddings:\n", 847 | " conf = embedding.GetConformer()\n", 848 | " SSD,transformMatrix = GetTransformMatrix(alignRef,conf,atomMatch)\n", 849 | " rdMolTransforms.TransformConformer(conf,transformMatrix)\n", 850 | " SSDs.append(SSD)\n", 851 | " return(SSDs)\n", 852 | "\n", 853 | "TransformEmbeddings(pcophore,embeddings,atomMatch)" 854 | ] 855 | }, 856 | { 857 | "cell_type": "markdown", 858 | "metadata": {}, 859 | "source": [ 860 | "All alignments seem to work pretty well (low squared deviation values). This is also true when looking at how the pharmacophoric features of the lgand fit the pharmacophoric query.\n", 861 | "\n", 862 | "![Aligned Embeddings](images/alignEmbed3Point.png)\n", 863 | "\n", 864 | "Unfortunately the query wasn't setup very well with respect to \"fixing\" the molecules. I.e. the chinoline as well as the backpocket pyridine binding moiety are spread all over 3D space. This can be improved by a modified query that now also contains an additional aromatic feature next to the hinge binder.\n", 865 | "\n", 866 | "![4point ph4](images/1py5Ph44pointph4InPocket.png)\n", 867 | "\n", 868 | "Running the same alignment as above ..." 869 | ] 870 | }, 871 | { 872 | "cell_type": "code", 873 | "execution_count": 37, 874 | "metadata": { 875 | "collapsed": true 876 | }, 877 | "outputs": [], 878 | "source": [ 879 | "moePh44Point = \"\"\"#moe:ph4que 2014.09\n", 880 | "#pharmacophore 7 tag t value *\n", 881 | "scheme t Unified matchsize i 0 use_Hs i 1 abspos i 0 title t $ useRval i 0 comment s $\n", 882 | "#feature 4 expr tt color ix x r y r z r r r ebits ix gbits ix m ix\n", 883 | "Acc df2f2 3.877 7.014 1.448 0.3 0 400 a64cff \n", 884 | "Acc df2f2 7.22 11.077 5.625 0.3 0 400 a64cff \n", 885 | "Don f20df2 4.778 8.432 7.805 0.3 0 400 a64cff \n", 886 | "Aro ff8000 1.56433333333334 7.06399999999999 3.135 1.5 0 400 a64cff\n", 887 | "#endpharmacophore\"\"\"\n" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 39, 893 | "metadata": { 894 | "collapsed": false 895 | }, 896 | "outputs": [ 897 | { 898 | "data": { 899 | "text/plain": [ 900 | "[12.247548982275546,\n", 901 | " 11.111090082701395,\n", 902 | " 3.826678589119382,\n", 903 | " 11.66292566298035,\n", 904 | " 8.204876289025478,\n", 905 | " 0.8101042567665502,\n", 906 | " 8.125825259923843,\n", 907 | " 5.6827776196306345,\n", 908 | " 6.945428752476992,\n", 909 | " 0.7102500150487856]" 910 | ] 911 | }, 912 | "execution_count": 39, 913 | "metadata": {}, 914 | "output_type": "execute_result" 915 | } 916 | ], 917 | "source": [ 918 | "ph4Info = parsePh4Content(moePh44Point)\n", 919 | "feats,radii = convertFeatureInfo(ph4Info)\n", 920 | "pcophore = Pharmacophore.Pharmacophore(feats)\n", 921 | "applyRadiiToBounds(radii,pcophore)\n", 922 | "ligand = Chem.MolFromSmiles(\"c1ccc(-c2n[nH]cc2-c2ccnc3ccccc23)nc1\")\n", 923 | "boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(ligand)\n", 924 | "canMatch,allMatches = EmbedLib.MatchPharmacophoreToMol(ligand,featFactory,pcophore)\n", 925 | "if canMatch:\n", 926 | " failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 927 | " pcophore,useDownsampling=True)\n", 928 | "atomMatch = [list(x.GetAtomIds()) for x in matched]\n", 929 | "ligH = Chem.AddHs(ligand)\n", 930 | "bm,embeddings,numFail = EmbedLib.EmbedPharmacophore(ligH,atomMatch,pcophore,count=10)\n", 931 | "TransformEmbeddings(pcophore,embeddings,atomMatch)" 932 | ] 933 | }, 934 | { 935 | "cell_type": "markdown", 936 | "metadata": {}, 937 | "source": [ 938 | "Adding constraints into the embedding clearly worsens the SSD values for many alignments. Put differently, even though the distance matrix allows the ph4 to be fit when the actual matching in 3D space happens only few of the matches can be properly embedded onto the pharmacophore. Still, for example alignment 6 shows a low SSD value and the actual alignment looks good.\n", 939 | "\n", 940 | "![Alignment 3](images/4PointAlignment6.png)\n", 941 | "\n", 942 | "Note that in this match the H-Bond donor is matched to the pyrazol nitrogen without the hydrogen atom. This is due to the definitions in the BaseFeatures.fdef file and is due to the potential tautomerism of the pyrazole group. \n", 943 | "This is what we would get if we \"cheat\" the second donor atom away (note that the first full match of the pharmacophore feature mappings is used in the embedding). We could also achieve the same by changing the definitions of the BaseFeatures file but this is easier for now." 944 | ] 945 | }, 946 | { 947 | "cell_type": "code", 948 | "execution_count": 41, 949 | "metadata": { 950 | "collapsed": false 951 | }, 952 | "outputs": [ 953 | { 954 | "name": "stdout", 955 | "output_type": "stream", 956 | "text": [ 957 | "0 Acceptor SingleAtomAcceptor (5,)\n", 958 | "0 Acceptor SingleAtomAcceptor (12,)\n", 959 | "0 Acceptor SingleAtomAcceptor (19,)\n", 960 | "1 Acceptor SingleAtomAcceptor (5,)\n", 961 | "1 Acceptor SingleAtomAcceptor (12,)\n", 962 | "1 Acceptor SingleAtomAcceptor (19,)\n", 963 | "2 Donor SingleAtomDonor (5,)\n", 964 | "2 Donor SingleAtomDonor (6,)\n", 965 | "3 Aromatic Arom5 (4, 5, 6, 7, 8)\n", 966 | "3 Aromatic Arom6 (0, 1, 2, 3, 19, 20)\n", 967 | "3 Aromatic Arom6 (9, 10, 11, 12, 13, 18)\n", 968 | "3 Aromatic Arom6 (13, 14, 15, 16, 17, 18)\n" 969 | ] 970 | } 971 | ], 972 | "source": [ 973 | "for (i,match) in enumerate(allMatches):\n", 974 | " for f in match:\n", 975 | " print(\"%d %s %s %s\"%(i, f.GetFamily(), f.GetType(), f.GetAtomIds()))" 976 | ] 977 | }, 978 | { 979 | "cell_type": "code", 980 | "execution_count": 42, 981 | "metadata": { 982 | "collapsed": false 983 | }, 984 | "outputs": [], 985 | "source": [ 986 | "allMatches[2] = (allMatches[2][1],)" 987 | ] 988 | }, 989 | { 990 | "cell_type": "code", 991 | "execution_count": 43, 992 | "metadata": { 993 | "collapsed": false 994 | }, 995 | "outputs": [ 996 | { 997 | "name": "stdout", 998 | "output_type": "stream", 999 | "text": [ 1000 | "0 Acceptor SingleAtomAcceptor (5,)\n", 1001 | "0 Acceptor SingleAtomAcceptor (12,)\n", 1002 | "0 Acceptor SingleAtomAcceptor (19,)\n", 1003 | "1 Acceptor SingleAtomAcceptor (5,)\n", 1004 | "1 Acceptor SingleAtomAcceptor (12,)\n", 1005 | "1 Acceptor SingleAtomAcceptor (19,)\n", 1006 | "2 Donor SingleAtomDonor (6,)\n", 1007 | "3 Aromatic Arom5 (4, 5, 6, 7, 8)\n", 1008 | "3 Aromatic Arom6 (0, 1, 2, 3, 19, 20)\n", 1009 | "3 Aromatic Arom6 (9, 10, 11, 12, 13, 18)\n", 1010 | "3 Aromatic Arom6 (13, 14, 15, 16, 17, 18)\n" 1011 | ] 1012 | } 1013 | ], 1014 | "source": [ 1015 | "for (i,match) in enumerate(allMatches):\n", 1016 | " for f in match:\n", 1017 | " print(\"%d %s %s %s\"%(i, f.GetFamily(), f.GetType(), f.GetAtomIds()))" 1018 | ] 1019 | }, 1020 | { 1021 | "cell_type": "code", 1022 | "execution_count": 45, 1023 | "metadata": { 1024 | "collapsed": false 1025 | }, 1026 | "outputs": [ 1027 | { 1028 | "data": { 1029 | "text/plain": [ 1030 | "[1.8460812176612649,\n", 1031 | " 11.59755647293786,\n", 1032 | " 2.4969798350293004,\n", 1033 | " 12.165228799867712,\n", 1034 | " 8.098470400659394,\n", 1035 | " 0.701126156123749,\n", 1036 | " 8.488922578961997,\n", 1037 | " 8.821280152742723,\n", 1038 | " 9.94333946662374,\n", 1039 | " 0.637983263851595]" 1040 | ] 1041 | }, 1042 | "execution_count": 45, 1043 | "metadata": {}, 1044 | "output_type": "execute_result" 1045 | } 1046 | ], 1047 | "source": [ 1048 | "ligand = Chem.MolFromSmiles(\"c1ccc(-c2n[nH]cc2-c2ccnc3ccccc23)nc1\")\n", 1049 | "boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(ligand)\n", 1050 | "failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 1051 | " pcophore,useDownsampling=True)\n", 1052 | "atomMatch = [list(x.GetAtomIds()) for x in matched]\n", 1053 | "ligH = Chem.AddHs(ligand)\n", 1054 | "bm,embeddings,numFail = EmbedLib.EmbedPharmacophore(ligH,atomMatch,pcophore,count=10)\n", 1055 | "TransformEmbeddings(pcophore,embeddings,atomMatch)" 1056 | ] 1057 | }, 1058 | { 1059 | "cell_type": "markdown", 1060 | "metadata": {}, 1061 | "source": [ 1062 | "In this case, alignment 6 shows a good SSD value which is also reflected by a good alignment (this time with the correct Donor alignment).\n", 1063 | "\n", 1064 | "![Donor alignment](images/4PointAlignmentCheat6N.png)" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "# Caveat !!" 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "markdown", 1076 | "metadata": {}, 1077 | "source": [ 1078 | "One of the next steps could be to completely \"pin down\" the ph4 so that the compound really fits the ph4. In order to do that, the backpocket pyridine would need to be brought into plane with the rest of the molecule. To achieve that, the aromatic ph4 feature needs to be included. \n", 1079 | "This is where things can get tricky ... in this case the acceptor feature of the pyridine ring system is also part of the aromatic system which means that during the bounds smoothing process an interference between the two will hinder a positive matching. " 1080 | ] 1081 | }, 1082 | { 1083 | "cell_type": "code", 1084 | "execution_count": 47, 1085 | "metadata": { 1086 | "collapsed": true 1087 | }, 1088 | "outputs": [], 1089 | "source": [ 1090 | "moePh45Point = \"\"\"#moe:ph4que 2014.09\n", 1091 | "#pharmacophore 7 tag t value *\n", 1092 | "scheme t Unified matchsize i 0 use_Hs i 1 abspos i 0 title t $ useRval i\n", 1093 | "0 comment s $\n", 1094 | "#feature 5 expr tt color ix x r y r z r r r ebits ix gbits ix m ix\n", 1095 | "Acc df2f2 3.877 7.014 1.448 1 0 400 a64cff \n", 1096 | "Acc df2f2 7.22 11.077 5.625 1 0 400 a64cff \n", 1097 | "Don f20df2 4.778 8.432 7.805 1 0 400 a64cff \n", 1098 | "Aro ff8000 1.56433333333334 7.06399999999999 3.135 1 0 400 a64cff \n", 1099 | "Aro ff8000 6.68983333333333 11.6213333333333 4.498 1 0 400 a64cff\n", 1100 | "#endpharmacophore\"\"\"" 1101 | ] 1102 | }, 1103 | { 1104 | "cell_type": "code", 1105 | "execution_count": 48, 1106 | "metadata": { 1107 | "collapsed": false 1108 | }, 1109 | "outputs": [], 1110 | "source": [ 1111 | "ph4Info = parsePh4Content(moePh45Point)\n", 1112 | "feats,radii = convertFeatureInfo(ph4Info)\n", 1113 | "pcophore = Pharmacophore.Pharmacophore(feats)\n", 1114 | "applyRadiiToBounds(radii,pcophore)\n", 1115 | "ligand = Chem.MolFromSmiles(\"c1ccc(-c2n[nH]cc2-c2ccnc3ccccc23)nc1\")\n", 1116 | "boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(ligand)\n", 1117 | "canMatch,allMatches = EmbedLib.MatchPharmacophoreToMol(ligand,featFactory,pcophore)\n", 1118 | "if canMatch:\n", 1119 | " failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 1120 | " pcophore,useDownsampling=True)" 1121 | ] 1122 | }, 1123 | { 1124 | "cell_type": "code", 1125 | "execution_count": 49, 1126 | "metadata": { 1127 | "collapsed": false 1128 | }, 1129 | "outputs": [ 1130 | { 1131 | "data": { 1132 | "text/plain": [ 1133 | "1" 1134 | ] 1135 | }, 1136 | "execution_count": 49, 1137 | "metadata": {}, 1138 | "output_type": "execute_result" 1139 | } 1140 | ], 1141 | "source": [ 1142 | "failed" 1143 | ] 1144 | }, 1145 | { 1146 | "cell_type": "markdown", 1147 | "metadata": {}, 1148 | "source": [ 1149 | "__When setting up ph4 queries, make sure that multi-atom pharmacophoric features do not include other pharmacophoric features!__ " 1150 | ] 1151 | }, 1152 | { 1153 | "cell_type": "markdown", 1154 | "metadata": {}, 1155 | "source": [ 1156 | "# Running over multiple molecules" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "markdown", 1161 | "metadata": {}, 1162 | "source": [ 1163 | "Now that we have know how to align the co-crystallised ligand back to its pharmacophore query we can run a couple of molecules that should fit into the pocket of that target enzyme (here Ren et al. doi:10.1016/j.ejmech.2009.07.008). We use the first 5 molecules from the list of actives and use the 4-point pharmacophore used above but with a radius of 1.0." 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "code", 1168 | "execution_count": 50, 1169 | "metadata": { 1170 | "collapsed": true 1171 | }, 1172 | "outputs": [], 1173 | "source": [ 1174 | "moePh44Point = \"\"\"#moe:ph4que 2014.09\n", 1175 | "#pharmacophore 7 tag t value *\n", 1176 | "scheme t Unified matchsize i 0 use_Hs i 1 abspos i 0 title t $ useRval i 0 comment s $\n", 1177 | "#feature 4 expr tt color ix x r y r z r r r ebits ix gbits ix m ix\n", 1178 | "Acc df2f2 3.877 7.014 1.448 1.0 0 400 a64cff \n", 1179 | "Acc df2f2 7.22 11.077 5.625 1.0 0 400 a64cff \n", 1180 | "Don f20df2 4.778 8.432 7.805 1.0 0 400 a64cff \n", 1181 | "Aro ff8000 1.56433333333334 7.06399999999999 3.135 1.0 0 400 a64cff\n", 1182 | "#endpharmacophore\"\"\"" 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "code", 1187 | "execution_count": 51, 1188 | "metadata": { 1189 | "collapsed": false 1190 | }, 1191 | "outputs": [ 1192 | { 1193 | "name": "stdout", 1194 | "output_type": "stream", 1195 | "text": [ 1196 | "0\n", 1197 | "1\n", 1198 | "2\n", 1199 | "3\n", 1200 | "Couldn't embed molecule 3\n", 1201 | "4\n" 1202 | ] 1203 | } 1204 | ], 1205 | "source": [ 1206 | "from operator import itemgetter\n", 1207 | "ph4Info = parsePh4Content(moePh44Point)\n", 1208 | "feats,radii = convertFeatureInfo(ph4Info)\n", 1209 | "pcophore = Pharmacophore.Pharmacophore(feats)\n", 1210 | "applyRadiiToBounds(radii,pcophore)\n", 1211 | "molSmiles = ['Cc1cccc(c2n[nH]cc2c3ccc4ncccc4n3)n1','Cc1cccnc1c2nc(N)sc2c3nc4cccnc4cc3',\n", 1212 | " 'Cc1cccc(c2[nH]c(CNc5cc(C(=O)N)ccc5)nc2c3ccc4nccnc4c3)n1','Clc1cccc(c2nc(N)sc2c3ccc4ncccc4n3)c1',\n", 1213 | " 'n1ccccc1c2nn3CCCc3c2c4ccnc5cc(NC(=O)NCCN(C)C)ccc45']\n", 1214 | "mols = [Chem.MolFromSmiles(smi) for smi in molSmiles]\n", 1215 | "res = []\n", 1216 | "for i,mol in enumerate(mols):\n", 1217 | " print(i)\n", 1218 | " boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(mol)\n", 1219 | " canMatch,allMatches = EmbedLib.MatchPharmacophoreToMol(mol,featFactory,pcophore)\n", 1220 | " if canMatch:\n", 1221 | " failed,boundsMatMatched,matched,matchDetails = EmbedLib.MatchPharmacophore(allMatches,boundsMat,\n", 1222 | " pcophore,useDownsampling=True)\n", 1223 | " if failed:\n", 1224 | " print(\"Couldn't embed molecule %d\"%i)\n", 1225 | " continue\n", 1226 | " else:\n", 1227 | " print(\"Couldn't match molecule %d\"%i)\n", 1228 | " continue\n", 1229 | " atomMatch = [list(x.GetAtomIds()) for x in matched]\n", 1230 | " try:\n", 1231 | " molH = Chem.AddHs(mol)\n", 1232 | " bm,embeddings,numFail = EmbedLib.EmbedPharmacophore(molH,atomMatch,pcophore,count=10)\n", 1233 | " except ValueError:\n", 1234 | " print (\"Bounds smoothing failed for molecule %d\"%i)\n", 1235 | " continue\n", 1236 | " SSDs = TransformEmbeddings(pcophore,embeddings,atomMatch) \n", 1237 | " bestFitIndex = min(enumerate(SSDs), key=itemgetter(1))[0] \n", 1238 | " res.append((SSDs[bestFitIndex],embeddings[bestFitIndex]))" 1239 | ] 1240 | }, 1241 | { 1242 | "cell_type": "code", 1243 | "execution_count": 52, 1244 | "metadata": { 1245 | "collapsed": false 1246 | }, 1247 | "outputs": [ 1248 | { 1249 | "data": { 1250 | "text/plain": [ 1251 | "[(5.4789081986121175, ),\n", 1252 | " (6.390745025093736, ),\n", 1253 | " (5.4231998457926665, ),\n", 1254 | " (6.303848386994872, )]" 1255 | ] 1256 | }, 1257 | "execution_count": 52, 1258 | "metadata": {}, 1259 | "output_type": "execute_result" 1260 | } 1261 | ], 1262 | "source": [ 1263 | "res" 1264 | ] 1265 | }, 1266 | { 1267 | "cell_type": "markdown", 1268 | "metadata": {}, 1269 | "source": [ 1270 | "As one can see based on RMSD values, the overall alignment to the pharmacophore is not very good. This is also reflected when looking at how the molecules are aligned:\n", 1271 | "\n", 1272 | "![multiMol alignment](images/4PointAlignmentAllN.png)\n", 1273 | "\n", 1274 | "Of course - this doesn't mean that the method itself is not good - it just shows that the molecules are not fitting the pharmacophore. The above example is merely there to show how to run the method on multiples molecules.\n", 1275 | "In addition it shows that one can use the basic alignment values in order to quickly remove molecules that do not properly fit the pharmacophore. However, be cautious since the conformations of the aligned molecules are not really good - but that's for another tutorial ... " 1276 | ] 1277 | }, 1278 | { 1279 | "cell_type": "code", 1280 | "execution_count": null, 1281 | "metadata": { 1282 | "collapsed": true 1283 | }, 1284 | "outputs": [], 1285 | "source": [] 1286 | } 1287 | ], 1288 | "metadata": { 1289 | "kernelspec": { 1290 | "display_name": "Python 2", 1291 | "language": "python", 1292 | "name": "python2" 1293 | }, 1294 | "language_info": { 1295 | "codemirror_mode": { 1296 | "name": "ipython", 1297 | "version": 2 1298 | }, 1299 | "file_extension": ".py", 1300 | "mimetype": "text/x-python", 1301 | "name": "python", 1302 | "nbconvert_exporter": "python", 1303 | "pygments_lexer": "ipython2", 1304 | "version": "2.7.11" 1305 | } 1306 | }, 1307 | "nbformat": 4, 1308 | "nbformat_minor": 0 1309 | } 1310 | -------------------------------------------------------------------------------- /Notebooks/data/Target_no_65.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/data/Target_no_65.pkl -------------------------------------------------------------------------------- /Notebooks/images/1py5Ph44pointph4InPocket.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/1py5Ph44pointph4InPocket.png -------------------------------------------------------------------------------- /Notebooks/images/1py5Ph4DistanceExample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/1py5Ph4DistanceExample.png -------------------------------------------------------------------------------- /Notebooks/images/1py5Ph4DistancesOrig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/1py5Ph4DistancesOrig.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignment3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignment3.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignment6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignment6.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignmentAll.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignmentAll.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignmentAllN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignmentAllN.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignmentCheat6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignmentCheat6.png -------------------------------------------------------------------------------- /Notebooks/images/4PointAlignmentCheat6N.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/4PointAlignmentCheat6N.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_coords_and_smiles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_coords_and_smiles.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_coords_and_smiles_out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_coords_and_smiles_out.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_descriptors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_descriptors.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_descriptors_out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_descriptors_out.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_descrs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_descrs.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_descrs_missing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_descrs_missing.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_generate_3d_coords.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_generate_3d_coords.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_generate_3d_coords_out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_generate_3d_coords_out.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_generate_confs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_generate_confs.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_generate_confs_out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_generate_confs_out.png -------------------------------------------------------------------------------- /Notebooks/images/KNIME_sanitization.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/KNIME_sanitization.png -------------------------------------------------------------------------------- /Notebooks/images/T5.shaded.132.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/T5.shaded.132.png -------------------------------------------------------------------------------- /Notebooks/images/alignEmbed3Point.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/alignEmbed3Point.png -------------------------------------------------------------------------------- /Notebooks/images/alignEmbed3PointN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/alignEmbed3PointN.png -------------------------------------------------------------------------------- /Notebooks/images/docs_overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/docs_overview.png -------------------------------------------------------------------------------- /Notebooks/images/docs_zoom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/docs_zoom.png -------------------------------------------------------------------------------- /Notebooks/images/ecodesystem.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/ecodesystem.png -------------------------------------------------------------------------------- /Notebooks/images/kinaseOverview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/kinaseOverview.png -------------------------------------------------------------------------------- /Notebooks/images/logo.lrg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/logo.lrg.png -------------------------------------------------------------------------------- /Notebooks/images/ph4_tutorial.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/ph4_tutorial.png -------------------------------------------------------------------------------- /Notebooks/images/tutorial_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Notebooks/images/tutorial_example.png -------------------------------------------------------------------------------- /Presentations/BrianKelley-NovartisChemicalUniverse.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/BrianKelley-NovartisChemicalUniverse.pdf -------------------------------------------------------------------------------- /Presentations/Brown_OriginsOf3D.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Brown_OriginsOf3D.pdf -------------------------------------------------------------------------------- /Presentations/Ehmki_and_KramerMatchedMolecularSeries.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Ehmki_and_KramerMatchedMolecularSeries.pdf -------------------------------------------------------------------------------- /Presentations/Flachsenberg_RingDecomposerLib.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Flachsenberg_RingDecomposerLib.pdf -------------------------------------------------------------------------------- /Presentations/Godin_OneCentralTool_Lightning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Godin_OneCentralTool_Lightning.pdf -------------------------------------------------------------------------------- /Presentations/JohnMayfield_Depiction.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/JohnMayfield_Depiction.pdf -------------------------------------------------------------------------------- /Presentations/Landrum_Schneider_GitHub_Git_and_RDKit.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Landrum_Schneider_GitHub_Git_and_RDKit.pdf -------------------------------------------------------------------------------- /Presentations/Pahl_NotebookTools_Intro.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Pahl_NotebookTools_Intro.pdf -------------------------------------------------------------------------------- /Presentations/PaoloTosco_OpenMM_RDKit_integration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/PaoloTosco_OpenMM_RDKit_integration.pdf -------------------------------------------------------------------------------- /Presentations/README.md: -------------------------------------------------------------------------------- 1 | Placeholder 2 | -------------------------------------------------------------------------------- /Presentations/Sayle_RDKitTautomers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Sayle_RDKitTautomers.pdf -------------------------------------------------------------------------------- /Presentations/Schwarze_RDKit_UGM_Oct_2016_How_to_Develop_New_RDKit_Nodes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Schwarze_RDKit_UGM_Oct_2016_How_to_Develop_New_RDKit_Nodes.pdf -------------------------------------------------------------------------------- /Presentations/Schwarze_RDKit_UGM_Oct_2016_Workshop_Writing_RDKit_KNIME_Nodes_Hands-On.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Schwarze_RDKit_UGM_Oct_2016_Workshop_Writing_RDKit_KNIME_Nodes_Hands-On.pdf -------------------------------------------------------------------------------- /Presentations/SelectivityMaps.py: -------------------------------------------------------------------------------- 1 | # $Id$ 2 | # 3 | # Copyright (c) 2016, Novartis Institutes for BioMedical Research Inc. 4 | # All rights reserved. 5 | # 6 | # Redistribution and use in source and binary forms, with or without 7 | # modification, are permitted provided that the following conditions are 8 | # met: 9 | # 10 | # * Redistributions of source code must retain the above copyright 11 | # notice, this list of conditions and the following disclaimer. 12 | # * Redistributions in binary form must reproduce the above 13 | # copyright notice, this list of conditions and the following 14 | # disclaimer in the documentation and/or other materials provided 15 | # with the distribution. 16 | # * Neither the name of Novartis Institutes for BioMedical Research Inc. 17 | # nor the names of its contributors may be used to endorse or promote 18 | # products derived from this software without specific prior written permission. 19 | # 20 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21 | # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22 | # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 23 | # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 24 | # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 | # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 | # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 | # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 | # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 | # 32 | # Based on SimilarityMaps.py by Sereina Riniker August 2013 33 | # 34 | # Created by Simon Ruedisser, September 2016 35 | 36 | import math 37 | import numpy 38 | import matplotlib.pyplot as plt 39 | from matplotlib import cm 40 | from rdkit.Chem import Draw 41 | 42 | def getProbaprod(fp, predictionFunction, target_nr = 0): 43 | """probability for selectivity on given target_nr: proba for active on target multiplied with proba for inactive on other targets 44 | sum of predict_log_proba to avoid underflow errors for small probabilities 45 | """ 46 | p = predictionFunction(fp) 47 | c = [0, 0, 0] 48 | c[target_nr] = 1 49 | s = p[0][0][c[0]] + p[1][0][c[1]] + p[2][0][c[2]] 50 | return math.exp(s) 51 | 52 | def GetAtomicWeightsForModel(probeMol, fpFunction, predictionFunction): 53 | """ 54 | Calculates the atomic weights for the probe molecule based on 55 | a fingerprint function and the prediction function of a ML model. 56 | 57 | Parameters: 58 | probeMol -- the probe molecule 59 | fpFunction -- the fingerprint function 60 | predictionFunction -- the prediction function of the ML model 61 | """ 62 | if hasattr(probeMol, '_fpInfo'): delattr(probeMol, '_fpInfo') 63 | probeFP = fpFunction(probeMol, -1) 64 | baseProba = predictionFunction(probeFP) 65 | # loop over atoms 66 | weights = [] 67 | for atomId in range(probeMol.GetNumAtoms()): 68 | newFP = fpFunction(probeMol, atomId) 69 | newProba = predictionFunction(newFP) 70 | weights.append(baseProba - newProba) 71 | if hasattr(probeMol, '_fpInfo'): delattr(probeMol, '_fpInfo') 72 | return weights 73 | 74 | def GetStandardizedWeights(weights, weightsScaling=True): 75 | """ 76 | Normalizes the weights, 77 | such that the absolute maximum weight equals 1.0. 78 | 79 | Parameters: 80 | weights -- the list with the atomic weights 81 | weightsScaling=False do not normalize weights 82 | """ 83 | tmp = [math.fabs(w) for w in weights] 84 | currentMax = max(tmp) 85 | if ((currentMax > 0) & (weightsScaling)): 86 | return [w/currentMax for w in weights], currentMax 87 | else: 88 | return weights, currentMax 89 | 90 | 91 | def GetSimilarityMapFromWeights(mol, weights, weightsScaling=True, colorMap=cm.PiYG, scale=-1, size=(250, 250), sigma=None, #@UndefinedVariable #pylint: disable=E1101 92 | coordScale=1.5, step=0.01, colors='k', contourLines=10, alpha=0.5, **kwargs): 93 | """ 94 | Generates the similarity map for a molecule given the atomic weights. 95 | 96 | Parameters: 97 | mol -- the molecule of interest 98 | colorMap -- the matplotlib color map scheme 99 | scale -- the scaling: scale < 0 -> the absolute maximum weight is used as maximum scale 100 | scale = double -> this is the maximum scale 101 | size -- the size of the figure 102 | sigma -- the sigma for the Gaussians 103 | coordScale -- scaling factor for the coordinates 104 | step -- the step for calcAtomGaussian 105 | colors -- color of the contour lines 106 | contourLines -- if integer number N: N contour lines are drawn 107 | if list(numbers): contour lines at these numbers are drawn 108 | alpha -- the alpha blending value for the contour lines 109 | kwargs -- additional arguments for drawing 110 | """ 111 | if mol.GetNumAtoms() < 2: raise ValueError("too few atoms") 112 | fig = Draw.MolToMPL(mol, coordScale=coordScale, size=size, **kwargs) 113 | if sigma is None: 114 | if mol.GetNumBonds() > 0: 115 | bond = mol.GetBondWithIdx(0) 116 | idx1 = bond.GetBeginAtomIdx() 117 | idx2 = bond.GetEndAtomIdx() 118 | sigma = 0.3 * math.sqrt(sum([(mol._atomPs[idx1][i]-mol._atomPs[idx2][i])**2 for i in range(2)])) 119 | else: 120 | sigma = 0.3 * math.sqrt(sum([(mol._atomPs[0][i]-mol._atomPs[1][i])**2 for i in range(2)])) 121 | sigma = round(sigma, 2) 122 | x, y, z = Draw.calcAtomGaussians(mol, sigma, weights=weights, step=step) 123 | # scaling 124 | if scale <= 0.0: maxScale = max(math.fabs(numpy.min(z)), math.fabs(numpy.max(z))) 125 | else: maxScale = scale 126 | # coloring 127 | if math.fabs(maxScale) < 1: 128 | maxScale = 1 129 | fig.axes[0].imshow(z, cmap=colorMap, interpolation='bilinear', origin='lower', extent=(0,1,0,1), vmin=-maxScale, vmax=maxScale) 130 | # contour lines 131 | # only draw lines if at least one weight is not zero 132 | if len([w for w in weights if w != 0.0]): 133 | fig.axes[0].contour(x, y, z, contourLines, colors=colors, alpha=alpha, **kwargs) 134 | return fig 135 | 136 | def GetSimilarityMapForModel(probeMol, fpFunction, predictionFunction, weightsScaling=True, **kwargs): 137 | """ 138 | Generates the similarity map for a given ML model and probe molecule, 139 | and fingerprint function. 140 | 141 | Parameters: 142 | probeMol -- the probe molecule 143 | fpFunction -- the fingerprint function 144 | predictionFunction -- the prediction function of the ML model 145 | kwargs -- additional arguments for drawing 146 | """ 147 | weights = GetAtomicWeightsForModel(probeMol, fpFunction, predictionFunction) 148 | weights, maxWeight = GetStandardizedWeights(weights, weightsScaling) 149 | fig = GetSimilarityMapFromWeights(probeMol, weights, **kwargs) 150 | return fig, maxWeight 151 | -------------------------------------------------------------------------------- /Presentations/Vianello_FasterSimilarityQueries.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdkit/UGM_2016/292ba0fac10f9ca610e872b201f7a84d748aab95/Presentations/Vianello_FasterSimilarityQueries.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # UGM 2016 2 | Materials from the 2016 RDKit UGM in Basel. 3 | 4 | - - - - - - - 5 | 6 | ## Lightning Talks 7 | 8 | **One central tool for Cheminformatics** 9 | 10 | Guillaume Godin 11 | 12 | Slides: [Presentations/Godin_OneCentralTool_Lightning.pdf](Presentations/Godin_OneCentralTool_Lightning.pdf) 13 | 14 | - - - - - - - 15 | 16 | **3D pharmacophores in the RDKit** 17 | 18 | Nikolaus Stiefl 19 | 20 | Notebook: [Notebooks/Stiefl_RDKitPh4FullPublication.ipynb](Notebooks/Stiefl_RDKitPh4FullPublication.ipynb) 21 | 22 | - - - - - - - 23 | 24 | ## Presentations / Tutorials 25 | 26 | Greg Landrum 27 | 28 | State of the toolkit: [Notebooks/State of the toolkit.distrib.ipynb](Notebooks/State of the toolkit.distrib.ipynb) 29 | 30 | What's new: [Notebooks/Whats New.ipynb](Notebooks/Whats New.ipynb) 31 | 32 | Bonus material, not actually from the UGM: 33 | 34 | A brief introduction to the RDKit: [Notebooks/Brief Introduction.ipynb](Notebooks/Brief Introduction.ipynb) 35 | 36 | - - - - - - - 37 | **An RDKit-centric intro to git and github** 38 | 39 | Greg Landrum and Nadine Schneider 40 | 41 | Slides: [Presentations/Landrum_Schneider_GitHub_Git_and_RDKit.pdf](Presentations/Landrum_Schneider_GitHub_Git_and_RDKit.pdf) 42 | 43 | - - - - - - - 44 | **Molecule selectivity prediction for serine proteases** 45 | 46 | Simon Ruedisser 47 | 48 | Notebook: [Presentations/Ruedisser_SelectivityProteases_talk.ipynb](Presentations/Ruedisser_SelectivityProteases_talk.ipynb) 49 | 50 | - - - - - - - 51 | **Chemically meaningful ring perception: An open-source implementation of the Unique Ring Families approach** 52 | 53 | Florian Flachsenberg 54 | 55 | Slides: [Presentations/Flachsenberg_RingDecomposerLib.pdf](Presentations/Flachsenberg_RingDecomposerLib.pdf) 56 | 57 | - - - - - - - 58 | **Boosting RDKit molecular simulations through OpenMM** 59 | 60 | Paolo Tosco 61 | 62 | Slides: [Presentations/PaoloTosco_OpenMM_RDKit_integration.pdf](Presentations/PaoloTosco_OpenMM_RDKit_integration.pdf) 63 | 64 | - - - - - - - 65 | **Data Pipelines and Mol_Lists: RDKit Tools for the Jupyter Notebook** 66 | 67 | Axel Pahl 68 | 69 | Notebook: [Notebooks/Pahl_NotebookTools_Tutorial.ipynb](Notebooks/Pahl_NotebookTools_Tutorial.ipynb) 70 | 71 | - - - - - - - 72 | **Matched Molecular Series: Measuring SAR Transferability** 73 | 74 | Emanuel Ehmki and Christian Kramer 75 | 76 | Slides: [Presentations/Ehmki_and_KramerMatchedMolecularSeries.pdf](Presentations/Ehmki_and_KramerMatchedMolecularSeries.pdf) 77 | 78 | - - - - - - - 79 | **How to develop RDKit nodes for KNIME workflows** 80 | 81 | Manuel Schwarze 82 | 83 | Slides: [Presentations/Schwarze_RDKit_UGM_Oct_2016_How_to_Develop_New_RDKit_Nodes.pdf](Presentations/Schwarze_RDKit_UGM_Oct_2016_How_to_Develop_New_RDKit_Nodes.pdf) 84 | 85 | - - - - - - - 86 | **Five not-so-easy pieces: Tautomers, nucleic acids, inorganics, reactions, and SMIRKS** 87 | 88 | Roger Sayle 89 | 90 | Slides: [Presentations/Sayle_RDKitTautomers.pdf](Presentations/Sayle_RDKitTautomers.pdf) 91 | 92 | - - - - - - - 93 | **The origin of three dimensionality in drug-like molecules** 94 | 95 | Nathan Brown 96 | 97 | Slides: [Presentations/Brown_OriginsOf3D.pdf](Presentations/Brown_OriginsOf3D.pdf) 98 | 99 | - - - - - - - 100 | **The Novartis Chemical Universe - Searching astronomically large spaces** 101 | 102 | Brian Kelley 103 | 104 | Slides: [Presentations/BrianKelley-NovartisChemicalUniverse.pdf](Presentations/BrianKelley-NovartisChemicalUniverse.pdf) 105 | 106 | - - - - - - - 107 | **Higher Quality Chemical Depictions: Lessons Learned and Advice** 108 | 109 | John Mayfield 110 | 111 | Slides: [Presentations/JohnMayfield_Depiction.pdf](Presentations/JohnMayfield_Depiction.pdf) 112 | 113 | - - - - - - - 114 | **Faster similarity queries using RDKit, binary fingerprints and relational databases** 115 | 116 | Riccardo Vianello 117 | 118 | Slides: [Presentations/Vianello_FasterSimilarityQueries.pdf](Presentations/Vianello_FasterSimilarityQueries.pdf) 119 | 120 | - - - - - - - 121 | **QSAR/ML tutorial** 122 | 123 | Nikolas Fechner 124 | 125 | Notebooks: 126 | - [Part 1](Tutorials/Part1_Toy_data_example_and_overfitting_risks.ipynb) 127 | - [Part 2](Tutorials/Part2_Descriptors_and_regression.ipynb) 128 | - [Part 3](Tutorials/Part3_Fingerprints_and_classification.ipynb) 129 | 130 | - - - - - - - 131 | **Implementing RDKit nodes for KNIME** 132 | 133 | Manuel Schwarze 134 | 135 | Slides: [Presentations/Schwarze_RDKit_UGM_Oct_2016_Workshop_Writing_RDKit_KNIME_Nodes_Hands-On.pdf](Presentations/Schwarze_RDKit_UGM_Oct_2016_Workshop_Writing_RDKit_KNIME_Nodes_Hands-On.pdf) 136 | -------------------------------------------------------------------------------- /Tutorials/README.md: -------------------------------------------------------------------------------- 1 | Placeholder 2 | --------------------------------------------------------------------------------